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PERSONALITY FACTORS IN COLLEGE DROPOUT 


ALFRED B. HEILBRUN, JR. 


State University of Iowa 


An entire freshman class (N= 2149) at the University of Iowa was admin- 
istered a personality and an intellectual ability test prior to their 1st academic 
year. 13 months later groups of Ist-year dropouts (DO) and nondropouts 
(NDO) were defined. Based upon a value-conformity hypothesis, it was 
predicted that DOs would be more assertive and less task-oriented. Intellectual 
ability was controlled as a factor in dropout by matching each DO with an 
NDO S having an identical ability score. Personality differences were studied 
at 3 ability levels and for the sexes separately. The results supported the 
hypothesis for both sexes but only at the high-ability level. 


There would be little debate among those 
interested in the higher educational process 
regarding the importance of clarifying the 
factors anteceding the failure to adjust to 
college programs. The loss in potential talent 
to the nation, the waste of institutional time 
and money, and the personal distress of the 
student created by academic underachieve- 
ment and college dropout are but three conse- 
quences of nonadjustment. 

To this point, intellectual measures have 
been by far the most successful predictors of 
college achievement (Fishman & Pasanella, 
1960; Garrett, 1949; Harris, 1940) as given 
by classroom grades. Since continuation in 
college is typically contingent upon maintain- 
ing at least a minimal grade-point average, 
there is also a necessary relationship between 
college dropout and intelligence. However, it 
seems reasonable to expect that personality 
factors may make a significant independent 
contribution to student attrition, especially if 
dropout in the first college year is considered. 
The initial year often provides unique de- 
mands for academic study and classroom 
behavior as well as for a peer adjustment 
outside of the family environment; the estab- 
lished behavioral patterns of the adolescent 
should partially determine the ease or diffi- 
culty in adjusting to these facets of college life 
and in motivating continuation or dropout. 


Some hypotheses relating personality to 
college dropout have been derived from these 
two overlapping realms of student experience 
—course work and social relationships. Mc- 
Keachie (1961) has reviewed earlier work 
and presented his own research into the 
relationships between classroom character- 
istics, personality attributes, and both grades 
in and attitudes toward the courses. He found 
that the degree to which the college class 
provided achievement, power (student par- 
ticipation), and warmth cues influenced the 
prediction of grade and course reaction from 
knowledge of the degree of need achievement, 
power, and affiliation of the student. Such 
results lead one to expect that initial dissatis- 
faction with college may be inadvertently 
elicited by the types of freshman courses 
provided, given certain personality attributes 
of the student. However, the present study 
did not allow this line of speculation to be 
systematically pursued, since considerable dif- 
ferences in course work existed among the 
freshman students studied. 

Gough (1962, 1963) has proposed the impor- 
tance of values in both college achievement 
and college dropout. He reports the results 
of several studies, many employing only high- 
ability students, which have found that stu- 
dents who score higher on personality scales 
of socialization (So) and responsibility (Re) 
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tend to be more successful in college. The So 
scale reflects personal maturity, freedom from 
rebellion and authority problems, and the 
capacity to live without friction with others; 
Re indicates seriousness of thought, well- 
developed values, and dependability. Pro- 
ceeding from Gough’s proposal that the more 
conforming and tractible but self-sufficient 
youth will be the more successful student, 
it would be expected that success is mediated 
in some measure by his greater acceptance 
of the prevalent values of the educational 
institution regarding the importance of higher 
education and an often confining code of 
social conduct. This possibility assumes con- 
siderable importance to the extent that it 
runs counter to the often adopted role of the 
college student as a “free spirit,” as one who 
should depart from traditional values partially 
because of his presumed intellectual status 
and creative potential but also as part of a 
more pervasive rejection of the values of the 
previous generation. 

The present study provides a test of the 
value hypothesis; specifically, do first-year 
college students whose personological makeup 
predisposes them to conform to the academic 
and social values of the institution make a 
better adjustment than students for whom the 
opposite is the case when college dropout is 
used as the criterion of adjustment? This 
question was investigated for the sexes sepa- 
rately and for differing ability levels within 
the same institution. The possibility that per- 
sonality may be differentially related to col- 
lege attrition dependent upon the ability of 
the student has not been systematically evalu- 
ated to this point; however, such a methodo- 
logical procedure proved useful in clarifying 
the personality correlates of academic under- 
achievement in an earlier study (Goodstein & 
Heilbrun, 1962). 


METHOD 
Measures 


Personality. The personality variables employed in 
the present study were the Need Scales scored from 
the 300-item Adjective Check List (ACL). The 15 
scales are rationally derived measures of manifest 
needs initially selected from Murray’s Need-Press 
System (1938) by Edwards in deriving his Personal 
Preference Schedule (1957). The scales are scored 
from self-descriptions on the ACL, and the raw 


scores are converted to J scores by reference to 
college norms (college mean=50, SD=10). The 
usefulness of the ACL in general and the scaling 
techniques in particular for research in a college 
setting is evidenced by some 20 published studies 
employing ACL scales as personological measures. 
A more complete description of the ACL and the 
derived scales as well as specific citations of studies 
in which they have been used will be available in 
a forthcoming manual (Gough & Heilbrun, in press). 

Brief descriptions of the Need Scale variables and 
predictions regarding differences between dropout 
students and their continuing counterparts are pro- 
vided in Table 1. It should be noted that the 
groupings of needs for purposes of prediction are 
based upon two a priori and admittedly overlapping 
concepts: (a) behaviors which generally predispose 
the student to an acceptance or rejection of social 
and academic values and, (b) behaviors which more 
specifically predispose the student to acceptance or 
rejection of academic values. Four variables for 
which predictions were not generated include af- 
filiation (to engage in many activities with friends), 
intraception (to think in terms of the motives under- 
lying behavior), nurturance (to provide help, sym- 
pathy, and affection to others), and heterosexuality 
(to engage in social and sexual experiences with the 
opposite sex). 

Inspection of the predictions given in Table 1 
indicates that the basic dimension for predicting 
general conformity or nonconformity is one ranging 
from behavioral passivity to assertiveness. The as- 
sumption made here is that, other things being 
equal, the more assertive student is more likely 
to question institutional values than a more passive 
student. Specific conformity or nonconformity to 
academic values is predicted in terms of whether 
the behaviors are consistent with a task-orientation ; 
the assumption is that, other things being equal, 
the student with personality attributes which facili- 
tate academic learning and performance will be 
more likely to accommodate to institutional academic 
values. No differential predictions were made with 
regard to sex or ability levels. ; 

Ability. Intellectual ability was estimated from the 
composite percentile on the University of Iowa 
entrance examination, using Iowa freshman norms. © 


Procedure 


Essentially the entire 1961 freshman class at the 
State University of Iowa, comprised of 1,144 males 
and 1,005 females, was given the entrance examina- 
tion and the ACL as part of a larger appraisal 
battery prior to beginning the first year. Knowledge 
of results of the ability test was later made available 
to the students but the personality test results 
were not. 

Both the ability and personality scores were held 
for approximately 13 months until official registra- 
tion lists for the 1962 academic year were available. 
At that time, subjects (Ss) who were part of the 
initial testing sample but who failed to register for 
the sophomore year at Iowa were defined as drop- 


Factors IN Dropout 3 


TABLE 1 


GROUPINGS OF PERSONALITY VARIABLES INTO THOSE PREDISPOSING THE STUDENT TO 
CONFORMITY OR NONCONFORMITY TO COLLEGE INSTITUTIONAL VALUES 














General conformity to institutional values 


Prediction® 


1. Deference—to conform to convention and follow 


the leadership of others 


NDO > DO 


2. Succorance—to receive encouragement, sympathy, 


and affection from others 


NDO > DO 


3. Abasement—to feel guilty and at fault when things 


go wrong and generally timid and inferior 


NDO > DO 


General Nonconformity to Institutional Values 


4. Autonomy—to act independently of others and of 


conventions 


DO > NDO 


5. Exhibition—to attract attention to oneself by dress 


or behavior 


DO > NDO 


6. Dominance—to assume leadership roles in relation- 


ship with others 


7. Aggression—to attack, criticize, or make fun of others 


DO > NDO 
DO > NDO 


Specific Conformity to Academic Values 


8. Achievement—to successfully accomplish tasks of 


social and personal significance 


NDO > DO 


9. Order—to stress organization and neatness in one’s 


activities 


NDO > DO 


10. Endurance—to work hard and keep at a task until 


it is completed 


NDO > DO 


Specific Nonconformity to Academic Values 


11. Change—to seek new experiences and avoid routine 


DO > NDO 








8 NDO = nondropout, DO = dropout. 


yuts (DO). Initially tested Ss who returned for 
he second academic year comprised the nondropout 
NDO) group. If greater time and monetary re- 
sources had been available for the study, a more 
efined criterion of attrition could have been em- 
yjloyed. For example, some DO Ss would have been 
liminated from the study because of compelling 
ircumstances which overshadowed personological 
actors as antecedents to withdrawal from school. 
lowever, it seems legitimate to assume that the 
nclusion of such Ss should not systematically influ- 
mce the personality comparisons of DO and NDO 
roups, although the precision of the comparisons 
would be reduced to some unknown extent. 

The influence of intellectual ability in college 
lropout was precisely controlled by matching pro- 
edures. The entrance composite-percentile scores of 
he 304 male DOs and 306 female DOs (27% and 
0% of all male and female freshmen) were first 
isted in order of magnitude. Proceeding alphabeti- 
ally, an NDO S was paired with each DO by sex 
ind by identical ability score. Accordingly, 304 
male pairs and 306 female pairs of students were 


generated, each pair including one DO and one 
NDO S. The male and female pair distributions 
were then cut into three ability levels (ist-43rd 
percentile, 44th—75th percentile, 76th—99th percentile) 
with roughly equal numbers of DO-NDO pairs in 
each (males—102, 112, 90 pairs, respectively; 
females—84, 110, 112 pairs, respectively). Summarily, 
statistical comparison of DO and NDO personality 
scale scores was based in each case upon groups 
which were identically constituted for ability level. 


RESULTS AND DISCUSSION 


A complete summary of findings regarding 
personality differences between DO and NDO 
freshmen as a function of sex and ability 
level is found in Table 2. Examination of the 
large number of comparisons included in this 
table suggests that interpretation of the find- 
ings can meaningfully proceed in terms of 
ability levels and sex differences. 
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TABLE 2 
DO anp NDO Neep ScALE MEANS FoR VARIOUS SEX AND ABILITY GROUPS 
AND RESULTS OF STATISTICAL COMPARISONS 
Male Female 
High ability | Moderate ability Low ability High ability Moderate ability Low ability 
Personality 

variable DO NDO DO NDO DO NDO DO NDO DO NDO DO NDO 

1. Deference 49.6 52.2* 50.0 51.0 49.3 51.7* 46.4 49.1% 48.8 48.2 50.4 49.8 
2. Succorance 48.6 48.4 48.3 50.7* 50.3 48.4 47.6 50.5 * 50.1 48.9 50.3 49.5 
3. Abasement 49.7 51.4 49.6 50.2 49.3 50.7 48.3 51.1 49.8 49,8 50.1 50.5 
4, Autonomy 50.2 46.8* 48.6 48.2 51.1 48,1%** 52.5 49.2* 50.2 50.5 48.9 49.6 
5. Exhibition 50.4 46.8** 50.7. 50.0 50.1 50.4 50.6 47.9% Stal eS 1eS 50.3 52.4* 
6. Dominance 50.5 50.5 50.5 49.6 50.5 49.9 52.4 47.8** 50.1 50.4 49.7 50.1 
7. Aggression 49,4 46.8* 49.0 49.4 49.9 48.8 51.6 49.6 49.6 51.7 48.6 50.1 
8. Achievement 50.2 53.2* 50.3 49.9 50.7. 50.0 52.6 50.7 49.7 49.2 48.8 49.3 
9. Order 50.6 54.4** 47.5 49.6* 49.8 49.1 SO adel 49.1 49.1 48.7 50.4 
10. Endurance 50.5 55.3%* 49.5 50.5 50.4 50.6 5 Sie 50.3. 50:2 50.8 50.2 
11. Change 50.0 46.8* 53.4 50.8** 51.2 50.8 53.5 50.0* 520m) 53:2 52.3 5249 
12. Affiliation 49.9 50.9 51.1 52.9 Adee: 47.8 46.1 50.4 48.9 St eeoien 
13. Intraception 50:8) 53.2 48.1 47.6 46.1 47.7 50.7 49.9 48.4 47.6 48.2 48.6 
14. Nurturance 49.8 51.5 51.1 49.8 49.6 51.2 48.3 47.9 51.1 49.3 51.0 50.6 
15. Heterosexuality 48.2 48.2 50:2, 50:9 $1.4 51.4 48.6 46.1 5253. oko 52.5 52.0 





Note.—SDs are not presented on this table for clarity of presentation. 
value of 10, and in none of the 90 DO-NDO comparisons was there significant heterogeneity of variance. 


These values tended to cluster around the expected 
The t statistic was used 


to test for mean differences with one-tail tests for predicted differences. 


*p <.05. 
ED < 01. 


Support for the value-conformity hypothe- 
sis was found only for high-ability students 
of both sexes. In the case of males, the pre- 
diction that first-year college dropouts would 
be more assertive and less passive in their 
social behaviors (Variables 1-7, Table 2) was 
supported in four of seven instances; for 
females, this prediction held up for six of 
seven variables. When task-relevant behav- 
iors are considered, support for the hy- 
pothesis that such behaviors are conducive to 
acceptance of institutional academic values 
and enhance the probability that the student 
will continue in college was found in all four 
predictions for males (Variables 8-11, Table 
2) and one of four predictions for females. 
Only a few predicted and reliable differences 
were found between male DO and NDO 
groups at lower-ability levels, and no pre- 
dicted difference was confirmed at the lower- 
ability levels for females. The generalization 
which emerges from these results is that 
personality makes an important systematic 
contribution to college attrition for high- 
ability students only; for such students, pas- 
sivity and task-oriented behaviors allow for 
a conformance with institutional values and 


decrease the probability of early discontinu- 
ance of their college attendance. Conversely, 
high-ability students of a more assertive, less 
task-oriented nature encounter greater dif- 
ficulty in value conformance and are more 
likely to drop out of college prior to the 
second year.* 


1 These conclusions do not follow directly from 
a statistical test but rather from the fact that sta- 
tistically significant differences in DO and NDO 
personality-scale means are conspicuously concen- 
trated at the high-ability level for both males and 
females and not at the lower-ability levels. Sta- 
tistical safeguards for these inferences would be 
provided by significant Drop Group X Ability Level 
interactions, that is, by showing that the personality 
differences between DO and NDO Ss at the high- 
ability levels differ significantly from the DO-NDO 
differences for nonhigh ability Ss (moderate plus 
low ability Ss). Considering the eight personality 
variables for which significant differences were found 
in the case of high-ability males, the interaction ¢ 
values indicated that four (exhibition, achievement, 
order, endurance) were highly significant (p < .01) 
and one (aggression) approached significance 
(p <.10). For females, the seven interaction effects 
were all significant (deference, succorance, autonomy, 
exhibition, dominance, change at p< .01; abase- 
ment at p< .05). In addition, the interaction effect 
for aggression was highly significant (p< .01) for 
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Although both the general conformity and 
specific conformity categories themselves and 
the personality: variables included in each 
were based upon the investigator’s personal 
convictions regarding personality functioning, 
there are empirical data available which bear 
upon the validity of these behavioral classes. 
The ACL Need Scales were developed by 
obtaining interjudge agreement regarding the 
association between manifest behaviors and 
the construct measured by each scale. Since 
no attempt was made to avoid item overlap 
(i.e., the relevancy of an adjective to more 
than one scaled construct), a between-scale 
percentage of item overlap statistic provides 
an index of judged communality among the 
various constructs. If the present cate- 
gorizations are valid, there should be 
higher item overlap percentages among the 
scales within the general conformity and 
specific conformity groupings than between 
these independently grouped scales. Table 3 
presents the overlap percentages of adjec- 
tives keyed in the same direction which were 
obtained for the 11 value-relevant scales as 
well as the interscale product-moment cor- 
relations. Summary statistics indicate that the 
seven general conformity variables form a 
community of behaviors which are conceptu- 
ally distinct from the four specific conformity 
variables. The mean overlap percentage 
among the general conformity scales is 33.8 
and their mean absolute 7 value is .46; in 
contrast, the average percentage overlap be- 
tween the seven general conformity scales and 
the four specific conformity scales is 9.1 and 
the mean absolute correlation is .25. Among 
the four specific conformity scales, the mean 
overlap percentage is 31.2, and the mean 
absolute 7 is .45; in comparison, the overlap 
percentage mean between these four scales 
and the seven general variables is 12.0 and 
the mean absolute correlation is .25 as 
reported above. 





females and in the predicted direction even though 
no DO-NDO aggression difference was significant at 
any ability level. No other significant interaction 
effects appeared for either sex. Thus, it seems safe 
to conclude that the value-conformity pattern of 
personality differences between DO and NDO males 
and females appears only at the high-ability level. 


TABLE 3 
ItEM OVERLAP PERCENTAGE AND INTERSCALE CORRELATION DATA FOR THE ELEVEN VALUE-RELEVANT SCALES 
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Interscale correlations appear second in cells to the left of the 


Note.—Item overlap percentage figures are given first in each cell and should be read down columns (not across rows). 


diagonal. 


These 7's are based upon a mixed male and female college sample of N = 1,455. 
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Although the item overlap and correlation 
data are consistent with the scale groupings 
employed in this investigation, they also pro- 
vide a basis for interpretive caution. Since 
the interscale correlations within groupings 
tend to be of low-moderate magnitude, with 
a few showing rather high correlations, each 
significant scale difference which was found 
between DO and NDO Ss at the high-abil- 
ity level cannot be considered an independ- 
ent test and substantiation of the value- 
conformity hypothesis. Yet the average 
common variance within both groupings of 
variables is low enough (20-21%) that the 
large number of significant differences which 
appeared for both male and female high-abil- 
ity Ss is impressive. In any case, the con- 
ceptual communalities among scales within 
each grouping make more global inference in 
terms of social or task-relevant value con- 
formity preferable to inferences based upon 
the individual scales. 

A most provocative question which the 
dropout by ability level findings generates 
is why passivity and task orientation should 
be of systematic importance for predicting 
dropout of only very bright college fresh- 
men. One hypothesis which can be offered ? 
is that the greater academic and social regi- 
mentation imposed upon incoming students at 
large universities is a greater source of frus- 
tration to the bright student than to those 
of lesser ability. This would presume that the 
bright adolescent is more likely to demon- 
strate a social-educational history in which 
he has been rewarded for the independent 
pursuit of intellectual interests or, at least, 
successful attainment within the context of 
smaller classroom groupings. Freshman cur- 
ricula which direct their requirements at the 
“average” student and are unable to effec- 
tively promote individual initiative within 
larger groupings provide blocks to such previ- 
ous satisfactions. This would be especially 
frustrating for the more independent, asser- 
tive adolescent of high intellectual caliber, 


2One might conservatively stop at this point by 
simply classing ability level as a moderator variable 
in predicting college dropout from personality scales 
(Ghiselli, 1963) without speculating about psycho- 
dynamics. The latter alternative was adopted in the 
hope of suggesting further areas for investigation. 


since he is more likely to be instigated to 
aggressive or regressive actions (e.g., dis- 
ruptions in the classroom, failure to attend 
class, abortive study) which only serve to 
further frustrate his educational goals. College 
dropout represents one obvious solution to his 
dilemma. The bright but passive and task- 
oriented student would experience less frus- 
tration for two reasons: (a) he is less likely 
to have engaged in independent intellectual 
pursuits prior to college; (6) he is more 
likely to obtain satisfactions from extrinsic 
rewards (e.g., grades) in classrooms which 
are still available in large, freshman classes. 
Further, he is less likely to behave in oppo- 
sition to institutional or parental expectations 
even if he experienced equivalent frustration; 
therefore, further increments in frustration 
by nonadjustive manifest behaviors would 
be avoided. 

If the foregoing analysis is correct, it might 
be expected that the results of the present 
study would not be replicated within certain 
educational institutions. For example, smaller 
colleges which enroll. only high-ability stu- 
dents and who provide for a presumably more 
intensive and challenging curriculum early in 
the student’s educational experience might 
actually show a reversal of the present find- 
ings. However, Holland’s (1959) study of 
highly gifted National Merit scholars, who 
attended some 279 educational institutions, 
offers little encouragement for such a con- 
clusion. He found that degree of socialization 
of the student was a cardinal predictor of 
academic performance during the first year 
over all institutions. 

The second implication of the pattern of 
results given in Table 2 is that task-relevant 
variables are less important contributors to 
dropout of the high-ability female than the 
male. This finding is highly reminiscent of 
a previous study (Heilbrun, 1963) of 
personality correlates of academic perform- 
ance as. given by cumulative grade-point 
average. The variables distinguishing male 
achievers were primarily task-relevant— 
achievement, endurance, change, and nurtur- 
ance. However, distinguishing variables for 
college girls were nurturance, exhibition, au- 
tonomy, aggression, intraception, abasement, 
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and endurance, primarily of a social inter- 
action character. Since both the patterns of 
personality differences between female high- 
ability DOs and NDOs and achievers and 
nonachievers indicate greater assertiveness 
and independence for the nonadjusting group, 
it seems possible that difficulties in sex-role 
conformity play a more significant function 
in educational failure for females than task 
orientation. (Passive and dependent behaviors 
are traditionally defined as components of 
the feminine role.) To test this, a masculinity- 
femininity scale, derived for the ACL by 
determining which adjectives were differen- 
tially endorsed by college males identified 
with masculine fathers and college females 
identified with feminine mothers (Heilbrun, 
1964), was employed to compare the 
high-ability female groups in the present 
study. Consistent with the sex-role difficulty 
interpretation, high-ability female DOs were 
more masculine (54.7, college mean = 50) 
than their NDO counterparts (51.3), the dif- 
ference being significant at the 5% level of 
confidence (¢ = 2.19). None of the other male 
or female ability groups differed on this 
measure of sex-role conformity. It can be 
concluded that the high-ability female experi- 
ences initial adjustive difficulties not only 
because she may be frustrated by the cur- 
riculum like her male counterpart but also 
because of additional problems associated 
with deviancy from expected feminine social 
role behaviors. 
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DELINQUENCY AND OBJECTIVE PERSONALITY 
TEST FACTORS 
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The Objective-Analytic Personality Test Battery was administered to an 
offender and a nonoffender Navy enlisted sample to determine the validity 
of these objective test dimensions in differentiating delinquent from non- 
delinquent groups. 8 of the 18 objective test factors differentiated the samples 
at the .05 confidence level or higher. However, when correlations were com- 
puted against number of offenses within the offender sample none of the 
factors was significantly related to the criterion. 


Recent advances in psychometric tech- 
niques have spurred interest in objective 
tests of temperament (as defined by Cattell, 
1957, pp. 225, 897; 1958; Thurstone, 1953). 
Specifically, the factor analytic techniques 
have been applied to measurements derived 
from these tests and it is thus possible to 
deal with dimensions of temperament meas- 
ured in the objective test domain rather than 
having to deal with results obtained from a 
single such test, which may often be less 
reliable than questionnaire scales. Eysenck 
(1952) has considered dimensions of extra- 
version-introversion and neuroticism-stability 
as measured by objective tests. The most 
comprehensive system of factor analytically 
derived objective test dimensions is that 
proposed by Cattell (1955, 1957). The 
validity of these dimensions has been con- 
sidered within military population (Knapp, 
1961; 1962; Knapp & Most, 1960). The 
present report is concerned with the validity 
of these dimensions as related to criteria of 
delinquency within the Navy. 

Self-report, or questionnaire, techniques 
have been applied in Navy settings in an 
attempt to isolate personality dimensions as- 
sociated with delinquency (Knapp, 1963, 
1964a, 1964b). These techniques have met with 
considerable success in differentiating delin- 
quent from nondelinquent groups and in 
differentiating the frequent offender from the 
occasional offender. However, conservatively 


1 The opinions expressed are those of the author 
and are not to be construed as being official or in 
any way representative of the United States Navy. 
Now with Educational and Industrial Testing Service, 
San Diego, California. 


speaking, the unaccounted for variance in the 
prediction of delinquency proneness still far 
exceeds that accounted for. The search for 
discriminant variables has continued in 
anticipation of the possible future use of 
a battery of such devices for screening pur- 
poses, perhaps for certain critical duty as- 
signments. However, selection, or screening, 
is not the only application which can be 
made of any knowledge secured concerning 
the personality correlates of delinquency. 
Hopefully this information will ultimately be 
employed in decision making at the judicial 
level and for increasing the effectiveness of 
programs of rehabilitation during incarcera- 
tion. To look way ahead, such information 
may lead to better placement of certain of 
these offenders in their environment (and 
perhaps induce changes in the environment 
itself) such as to reduce the likelihood of 
their frequently committing offensive acts. 
The present study was thus undertaken to 
extend knowledge of the temperamental and 
personality characteristics associated with 
delinquency in the military. 


PROCEDURE 


Subjects. Two samples, a brig sample and a 
nonoffender sample, are considered in the present 
study. The brig sample consisted of 92 white 
male confinees of the Navy Brig, Marine Corps 
Barracks, Navy Station, San Diego. The sample was 
taken from among individuals entering the brig 
during a 1-month period and was selected on the 
basis of ready availability of previous offense 
records. Subjects (Ss) ranged in age from 17 to 31 
years with a mean age of 20.3 years. Educational 
level was from 7 to 13 years of school completed 
with a mean of 9.8 years. General Classification 
Test (GCT) scores, a measure of verbal aptitude, 
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TABLE 1 


ANALYSIS OF COVARIANCE SUMMARY TABLE FOR COMPARISON OF A BRIG AND A 
NONOFFENDER SAMPLE ON OBJECTIVE PERSONALITY TEST FACTORS 








Adjusted means 





; Brig Non- MS 
Universal sample offender 
index Factor description (N = 92) (N = 98) Error Groups F 12.3455 
UI 16 Assertiveness 34.25 35.43 29.27 62.64 2.14 05 
Wiel 7, Inhibition 35.60 34.12 22.76 98.92 4.35* .03 
UI 18 Insecure Over- 25.00 24.11 17.04 35.66 2.09 07 
compensation 
UI 19 Critical Exactness 25.60 24.60 20.63 46.25 2.24 .03 
UI 20 Social Conformity 40.54 38.48 20 ASE 192565 OND as BS 
UI 21 Exuberance 39.45 40.08 32.76 18.11 B55 —.03 
UI 22 Alertness 30.26 28.95 27.33 76.97 2.82 —.02 
UI 23 Capacity to Mobilize 20.49 19.32 av 62.65 4.76* —.04 
UI 24 Anxiety 31.18 29.14 D2 Oe Sieo 8.42** —.10 
UI 25 Realism 34.12 36.02 31.40 165.48 aD fas 02 
UI 26 Self-Centeredness 41.68 Sie 21.31 693.99 B20 as —.17 
UL 27 Apathy 29.43 30.50 20.53 52.40 2.55 ada 
UI 28 Inner Weakness 29.89 29.26 16.52 18.15 1.10 04 
UI 29 Overresponsiveness 31.29 28.05 ZALOS AT Teo 22.06*** 02 
UI 30 Independence 15.78 14.41 9.58 85.67 8.94** —.17 
est Wary Realism 34.79 Soil 18.70 8.00 43 —.01 
Ws? Extraversion 30.50 30.01 20.34 10.75 253 aly 
UL 33 Dourness 14.61 15.40 9.98 28.35 2.84 —.02 





® The 12.345 column presents correlations between objective test factors and number of offenses with verbal aptitude, 


educational level, and length of service partialed out. 
*b < .05. 


Bpe<,.01- 
a p< .001. 


ranged from 34 to 71 with a mean of 48.7. Length 
of service ranged from 7 to 67 months, with a mean 
of 26.3 months. 

The nonoffender sample consisted of 98 white, 
Navy enlisted men, none of whom had previously 
committed offenses while in the Navy. These Ss 
were selected from two sources, the San Diego 
Naval Station and one of the Navy’s heavy cruisers. 
The group was selected such as to be reasonably 
equated with the brig sample on variables of age, 
education, and length of service since these variables 
have been found to be related to disciplinary 
problems within the military (Flyer, 1959; Knapp, 
1962). Additional control for these variables was 
accomplished through analysis of covariance de- 
scribed below. The age range for the nonoffender 
sample was from 17 to 29 with a mean of 203 
years. Educational level was from 7 to 13 years 
with a mean of 10.4 years of school completed. The 
GCT ranged from 30 to 67 with a mean of 49.6. 
Length of service ranged from 7 months to 11 
years with a mean of 20.3 months. As nearly as 
can be determined, this sample appears to be fairly 
representative of the Navy nonoffender enlisted 
population. A sample of 11,000 incoming recruits 
for the year 1960 had approximate means of 
10.7 and 50.0 for years of school completed and 


GCT, respectively which compares with means of 
10.4 and 49.6 in the present sample. 

Tests. The objective personality dimensions con- 
sidered were those obtained from Cattell’s (1955) 
Objective-Analytic Personality Test Battery (O-A 
Battery). The battery as administered in the 
present study consists of some 68 separate tests and 
test scores which were combined in such a way as 
to yield scores for the 18 obliqued factors previ- 
ously reported by Cattell (1955, 1957). 

Method. Since the first aim of the present study 
was to: compare an offender group with a non- 
offender group on the personality dimensions tapped 
by the O-A Battery, the effects of certain other 
variables often found to differentiate delinquent 
from nondelinquent groups were controlled. These 
variables were GCT, educational level, and length 
of service. Control of the latter was necessary since 
a man’s length of service might be expected to be 
associated with his opportunity to get into trouble. 
An attempt was made to equate the two groups as 
nearly as was feasible on these variables. Since a 
perfect matching was not possible, analysis of 
covariance was used. 

A second analysis was undertaken to further 
investigate the relationship of O-A Battery factors 
to criteria of delinquency. Within the brig sample 


10 RoBert R. KNAPP 


each individual’s total number of offenses committed 
was determined through examination of service 
records. Each personality factor was then cor- 
related with total number of offenses, with the 
effects of educational level, GCT, and length of 
service partialed out. 


RESULTS 


Table 1 presents adjusted means for the 
objective test dimensions and tests of signifi- 
cance of difference between the brig and non- 
offender samples. Partial correlations of the 
personality factors against number of offenses 
are also presented. 

Of the 18 objective test factors considered 
in the present analysis, eight significantly 
differentiated the brig from the nonoffender 
sample at the .05 confidence level or higher. 
The brig sample scored higher on factors UI 
17, 20, 23, 24, 26, 29, and 30, while the non- 
offender sample was higher on UI 25. 

None of the partial correlations against 
number of offenses reached significance at the 
.05 confidence level. 


DIscuSSION 


The results from the use of the present 
objective test factors to differentiate an of- 
fender from a nonoffender sample are encour- 
aging. Even when the samples are matched 
for many of the demographic variables which 
previously have been demonstrated to be 
related to military delinquency, 8 of the 
18 factors differentiated the groups at or 
above the .05 probability level. 

From present tentative interpretations 
given to these factors the present offender 
group would be depicted as highly Self- 
Centered (UI 26), highly Overresponsive 
(UI 29), and as showing greater Independ- 
ence (UI 30), and more Anxiety (UI 24). 
UI 25 shows the nonoffender group to be 
higher on realism. Several low-order relation- 
ships with some of the factors run contrary 
to normal expectations. UI 17 shows the 
delinquent group as scoring higher on Inhibi- 
tion, UI 20 depicts the offender group as 
higher on Social Conformity. 


Greatest confidence can probably be placed 
on the results from the two factors which give 
the best differentiation, Overresponsiveness 
and Self-Centerednenss. 

The second phase of the present analysis 
is not as encouraging. None of the objective 
factors were significantly related to’the num- 
ber of offenses within the confined group. 
However, the partialing technique provided a 
fairly stringent test of these relationships and 
no attempt was made to combine the factors 
through multiple regression techniques. 
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VALIDITY OF SOCIOECONOMIC ORIGIN AS A PREDICTOR 
OF EXECUTIVE SUCCESS 


ALBERT PORTER 
San Jose State College1 


A population of 337 male Stanford Graduate School of Business, Master of 
Business Administration (MBA) alumni was analyzed for association between 
executive success criteria and socioeconomic origin measured by father’s occupa- 
tion when S was in elementary school, socioeconomic origin elementary school 
(SEO-elem.), and when S received MBA degree (SEO-MBA). No significant 
correlation was found between SEO-MBA and any criterion. SEO-elem. cor- 
related significantly (p < .05) with policy authority, level in organization, and 
job interest; but not with pay, level in business world, organization size, 
career progress satisfaction, and membership in general management. More 
comparable studies must precede generalizations about findings; the ultimate 
value of such research may be to dispel belief in folklore and hence open the 
way for the science-based study and practice of management. 


Studies of vertical occupational mobility 
(Havemann & West, 1952; Newcomer, 1955; 
Sorokin, 1927; Taussig & Joslyn, 1932; 
Thorndike & Hagen, 1959; Warner & Abeg- 
glen, 1955a, 1955b) quite consistently report 
a strong tendency for the individual to gravi- 
tate toward his father’s socioeconomic level. 
Thus one might expect, upon analyzing a 
population of executives well established in 
their careers, to find a highly significant cor- 
relation between socioeconomic origin and 
later achieved executive success measured by 
such traditional criteria as pay, rank, policy- 
deciding authority, membership in general 
management, and the like. The present paper 
reports one such correlation analysis. 


METHOD 


Subjects. The population consisted of 337 men 
receiving the degree Master of Business Administra- 
tion (MBA) from the Stanford Graduate School of 
Business (SGSB) prior to 1944, and responding 
both to the SGSB’s 1958 alumni vocational survey 
(AVS) (Ubrbrock, 1959) and to the socioeconomic- 


1 Financial assistance for this research was pro- 
vided by the Western Management Science Institute, 
of the University of California Graduate School 
of Business Administration. Computations were per- 
formed on the IBM 7090 at the University of Cali- 
fornia at Los Angeles Western Data Processing 
Center. The author is grateful for the vital assistance 
and cooperation extended by Ernest Arbuckle, 
Thomas Harrell, and the secretarial staff, of the 
Stanford Graduate School of Business, in connection 
with the administration of the father’s-occupation 
questionnaire to the MBA alumni. 
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origin questionnaire mailed by the writer in 1962. 
Since 429 questionnaires were sent by the writer, 
the consequent 77% response raises the question 
of self-selection bias, which cannot be answered with 
certainty in this paper. It is believed, however, that 
self-selection bias was relatively slight. This belief 
is based on: (a) Williams’ (1959) finding, during 
a personal follow-up of the original AVS mailing, 
that nonrepliers were generally equivalent success- 
wise (pay and rank) to repliers; and (b) Warner 
and Abegglen’s (1955a, 1955b) finding that the 
mobile business élite had no aversion to reporting 
low socioeconomic origin. It seems plausible that 
the present study’s nonrepliers would tend to be 
low-success, low-origin, which bias would overstate 
the predictive validity of socioeconomic origin. Thus 
high validities would need to be viewed with skepti- 
cism—a need which did not arise in the present 
analysis, as will be seen. 

Criteria. The eight simple criteria were: pay, 
policy deciding authority, level in organization, level 
in business world, organization size, job interest, 
career progress satisfaction, and membership in gen- 
eral management. Coding procedures and _ their 
rationale are set forth at length in Porter, 1961, pp. 
143-158; somewhat more briefly in Porter, 1962a, 
1962b. As an aid in interpreting the present paper’s 
statistical exhibits some comment will here be made 
regarding the coding. 

Since our business world has thus far utilized an 
apprentice-type advancement upward through a 
hierarchical management structure (Dalton, 1959; 
Leavitt & Whisler, 1958), each subject’s (S’s) at- 
tained age should ideally be taken into account in 
scoring his various dimensions of “success,” par- 
ticularly as regards the nonsubjective measures of 
pay, rank, authority, and membership in general 
management. In this study, however, such a nor- 
malization was attempted only for pay. Pay 
normalization was done on the basis of years since 
award of MBA degree: points ranging from 1 to 12 
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were assigned to half-standard-error bands above 
and below a least-squares regression line fitted to the 
logarithms of reported incomes plotted on ratio 
paper, with reported annual pay (including bonuses) 
as the ordinate, and year of MBA award as the 
abscissa.Policy-deciding authority was coded from 
the AVS item in which S indicated that he did not 
participate in company-wide policy deciding, that 
he made indirect recommendations to the final de- 
ciding authority, that he made direct recommenda- 
tions, or that he himself was the final deciding 
authority. 

The AVS questionnaire asked the respondent to 
check his organizational level, ranging from “I am 
the chief executive” to “sixth or more [level] below 
[the chief executive].” For the criterion “level in 
organization,” the coding scheme assigned values 
ranging from 7 points for “chief executive” to 1 point 
for “sixth or more below.” The criterion “level in 
business world” used the responses to the AVS 
questionnaire item for size of entire employing 
organization, from “less than 50 employees” to “over 
15,000 employees.” The coding process rested upon 
an assumption of a universal 1-over-10 organiza- 
tional hierarchy throughout the American business 
world. An S reporting himself as chief executive 
of a firm having 100 or fewer employees was deemed 
to be at the peak of an organizational pyramid 
having three levels: 100 employees at the bottom, 
10 first-level supervisors above them. Another S, 
reporting himself as third level below the chief execu- 
tive in an organization of over 15,000 employees, 
was similarly dealt with, as being at the top of a 
three-level pyramid within the greater hierarchy of 
the firm. Each of these respondents would be as- 
signed 3 points for “level in business world.” The 
chief executive of the over 15,000 employee firm 
would receive 6 points on the assumption that he 
stands at the peak of a pyramid having six floors, 
the lowest of which would have spaces for 100,000 
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nonmanagement employees, the uppermost accom- 
modating only the chief executive. 

Organization size is included as a success criterion 
on the ground that, at today’s “primitive stages” 
(Harrell, 1961, p. 164) of executive selection research, 
it is important to search for any likely clues as 
to what type of man seems to gravitate toward what 
type of situation, one major situational factor being 
size of employing organization. The AVS question- 
naire provided a range of 8 points for this criterion. 

Job interest ranged from 1 point for AVS response 
that job was “dull, practically all the time,” through 
intermediate stages to the top score of 5 points for 
“very interesting,” the distribution being consider- 
ably skewed due to bunching at the “very interest- 
ing” end of the spectrum, as shown in Table 2. 

Membership in general management was a di- 
chotomy: in general management or not. This clas- 
sification had been done prior to the present study 
by SGSB staff members administering the AVS, and 
was based on examination of the entire questionnaire 
for indications that the S was in management and 
was responsible for company-wide policy making and 
approval. 

Predictors. There were two predictors: father’s 
occupation when S was in elementary school, and 
when S received the MBA degree. It was felt that, 
by obtaining father’s occupation as of S’s elementary 
school days and as of MBA award, some light 
might be shed on the as yet little understood nature 
of the “family effect” which apparently persists so 
stubbornly throughout various cultures and quickly 
reasserts itself even after violent revolutions. A 
strikingly high predictive validity for the elementary 
school predictor would, for example, lend support 
to a hypothesis that the high socioeconomic origin 
child receives a superior enculturation during forma- 
tive years. If, however, the MBA-award predictor 
towered supreme, there might be inferred support 
for a different hypothesis, namely, that family effect 


TABLE 1 


CORRELATION COEFFICIENTS FOR Two PREDICTORS AND E1cut CRITERIA 
OF EXECUTIVE SUCCESS 











Variable 2 3 4 5 6 7 8 9 10 
1 Pay +26** +26** +22** 1403 +23** +30** +26** +04 +03 
2 Policy authority +43** +05 — Agee SE 28RR E25. 3g ee eet 
3 Level in organization +68** —44** +15** +110 +24** t411* +401 
4 Level in business world +12* 00 —O1 +04 +04 —04 
5 Organization size —12*  —12*—- —25** -—03iets 
6 Job interest +36** +410 +12* +01 
7 Career progress satisfaction +16** +05 +02 
8 General management +06 +04 
9 Socioeconomic origin +67** 
(elem sch) 
10 Socioeconomic origin 
(MBA) 





Note.—Decimals omitted. The subjects were 337 pre-1944 male MBAs. 


ne < .05 
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TABLE 2 
MEANS, STANDARD DEVIATIONS, AND RANGES FOR ALL VARIABLES 
Variable Ne M SD) Range 
1 Pay 336 6.4 2.8 1—12 
2 Policy-deciding authority 333 Dall 0.96 1—4 
3 Level in organization 334 3.9 Dal 1—7 
4 Level in business world 329 Del 1.6 1—6 
5 Organization size Som 4.3 ei 1—8 
6 Job interest 337 47 0.70 2—5 
7 Career progress satisfaction 327 1:9 0.64 to 
8 Membership in general management 337 0.38 0.59 0—1 (dichotomy) 
9 Socioeconomic origin (elem school) 337 4.4 12 1—7 
10 Socioeconomic origin (MBA award) 271 4.6 NP} 1—7 
Note.—The subjects were 337 pre-1944 male MBAs. 
# Correlation program permitted data gaps. Significance-test was based on matrix of Nsshown on computer output. The N 


for any given correlation coefficient can be closely estimated by using the lower of the 2 Ns from above table. 


has a considerable component of what might be 
termed string pulling, wherein the father drops the 
discreet word in the elite groups among which he 
moves, and*there consequently fly open for his 
son, upon receipt of the MBA, doors leading to 
rapid-ascension career ladders. No attempt was made 
to explore yet a third strand of family-effect theory, 
espoused by Sorokin (1927), namely, the possibility 
that the business élite form a genetically superior 
social class due to Darwinian selection. 


RESULTS AND DISCUSSION 


Table 1 shows the correlation matrix; 
Table 2 shows means, standard deviations, 
and ranges for all variables. 

Virtually no support is to be found here for 
the widespread belief in family effect as sug- 
gested by the references cited in the paper’s 
opening paragraph. Many similar research 
studies need to be undertaken: at the very 
least replications, but preferably more elegant 
designs minimizing the present study’s crudi- 
ties such as undetermined reliability, self- 
selection bias, and criterion contaminations, 
statistical distortions, and the changing 
nature of management organizations versus 
the status quo inherent in correlation studies. 
A comparison of several such studies might 
make it justifiable to generalize about the 
relationship between socioeconomic origin and 
later achieved executive success. 
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SOME CORRELATES OF CREATIVITY IN 
ENGINEERING PERSONNEL 


CHARLES D. McDERMID 1 
McMurry Company, Chicago 


This study tested recently developed criteria and predictors of scientific 
creativity in an industrial setting—specifically, in an applied engineering division 
of a company manufacturing consumer goods. The California Psychological 
Inventory (CPI), the Vocational Preference Inventory (VPI), the Welsh 
Figure Preference Test (FPT), the Social Insight Test (SIT), Gough’s Ad- 
jective Check List (ACL), the Concept Mastery Test (CMT), and the Bio- 
graphical Information (form) for Research and Scientific Talent (BIRST) 
were correlated with supervisor and peer ratings of creativity in 58 engineering 
and technical personnel. The most significant correlations were obtained be- 
tween the criteria and the ACL and BIRST. These results confirm other research 
findings which suggest that self-reports and biographical data, especially those 
which describe interests or achievements of a creative nature, are currently the 


most effective predictors of creative performance in real-life situations. 


Creativity has become a magic word. 
Academicians are discussing it, industrialists 
are seeking it, and the government is sup- 
porting it. No one, however, seems able to 
define it adequately or measure it precisely. 

Historically, few psychologists devoted at- 
tention to the assessment of creativity before 
Guilford (1950). In recent years, however, 
an increasing activity of both a theoretical 
and experimental nature has enriched our 
understanding of creativity (MacKinnon, 
1961; Taylor & Barron, 1963). 

Industrial organizations, of course, have a 
very direct interest in measuring creative 
talent for functional areas ranging from 
product research to advertising copy. Within 
this context, such investigators as Albright 
and Glennon (1961) have studied the identi- 
fication of creativity in scientific and technical 
personnel. 

This study was designed to investigate 
some of the more promising instruments for 
assessing the creativity of engineers in an 
applied industrial setting. Ultimately, it is 
hoped, such measurements can be refined 
and instruments developed for predicting the 
creative performance of engineers, and select- 
ing or placing them accordingly. 


1 The author conducted the research for this article 
during his employment with Humber, Mundie and 
McClary, Evanston, Illinois. 


METHOD : 
Sample 


The subjects (Ss) of this study were volunteer 
engineers and technical personnel in the engineering 
division of the Hammond Organ Company.? All 
technical personnel in the division (N=75) were 
invited to participate; complete data were obtained 
from 58 Ss—77% of the total group. 

All Ss were male and ranged in age from 22 to 60, 
with a mean of 37.9 years. Three held the master’s 
degree, 19 the bachelor’s degree, 32 had attended 
college, and 4 had graduated from high school; the 
mean education was 14.8 years. Length of service 
with Hammond ranged from 1 to 27 years, with a 
mean of 5.1 years. The engineers who participated 
in this study did not differ significantly in terms 
of education, length of service, supervisory responsi- 
bilities, or compensation from those who chose not 
to participate. But the nonparticipants, with a mean 
age of 43.7 years, were significantly older (p < .05) 
than the participants. 


Predictors 


Seven psychometric instruments were used as 
predictors: 

1. California Psychological Inventory (CPI) 

2. Holland’s Vocational Preference Inventory 
(VPI)—fifth revision 

3. Welsh Figure Preference Test (FPT)—research 
edition 

4. Chapin’s Social Insight Test (SIT) 

5. Gough’s Adjective Check List (ACL) 


* The author wishes to thank the management and 
engineering personnel of the Hammond Organ Com- 
pany, who contributed financial support and personal 
time to this study. 
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6. Terman’s Concept Mastery Test (CMT) 
7. Biographical Information for Research and 
Scientific Talent (BIRST)—Forms P and S 


Criteria 


The basic criteria were ratings of creativity ob- 
tained specifically for this study. Each engineer was 
rated by his immediate supervisor on the Supervisor’s 
Evaluation of Research Personnel (Buel, 1960). In 
addition, each engineer was rated by that associate 
considered most familiar with his work on the Check 
List Rating Scale—Form C-1 (Taylor, 1958). 

A third criterion, called the Tripartite Grouping, 
was derived from these two basic criteria. Each 
engineer who achieved a rating above the fiftieth 
percentile in both his supervisor and peer ratings 
(NV = 20) was assigned a score of three. Each engi- 
neer rated below the fiftieth percentile by both his 
supervisor and peer (V =16) was assigned a score 
of one. Those engineers who were rated above the 
fiftieth percentile on one instrument and below the 
fiftieth percentile on the other (N=22) were 
assigned a score of two. 


Procedure 


The volunteer participants completed the psycho- 
metric instruments on their own time and at their 
own convenience. They mailed the completed forms 
directly to the experimenter, who was an outside 
consultant rather than a regular employee of the 
organization. The Ss were assured that not only was 
participation in the study voluntary, but that their 
decision to participate or not to participate, as well 
as all the scores and results obtained in the study, 
would be kept completely confidential. No reports 
on individual performance were made to manage- 
ment. Provision was made, however, for individual 
feedback in that each interested participant was 
encouraged to contact the experimenter for a 
confidential discussion of his own results. 

After the predictor data were obtained from the 58 
Ss, and it appeared unlikely that the other 17 poten- 
tial Ss would participate, the rating forms were dis- 
tributed to peers and supervisors. Again, the con- 
fidentiality of the experiment was stressed, and the 
completed forms were mailed directly to the experi- 
menter. Rating data were obtained on all 58 Ss who 
had completed the predictor instruments. 

At no time during the course of the study were 
the specific purposes of this research discussed with 
the Ss. For this reason it is felt that neither the test 
nor rating results were distorted in the direction 
of “creativity,” nor, for that matter, did there 
appear to be any significant motivation to fake 
the answers in any given direction. 


RESULTS 
Correlations between Criteria and Predictors 


Table 1 presents the correlations obtained 
between the three criteria of creativity and 


the 65 test variables.* Approximately 10 of 
the 195 correlations would be expected to be 
significant at the .05 level, and 2 at the .01 
level on the basis of chance alone. In this 
study, 15 were significant at the .05 level 
and 7 at the .01 level. Thus the predictors 
as a group may be considered significantly 
related to the ratings of creativity, but at 
a low level of confidence, 

Close inspection of Table 1 reveals that 
four tests, with a total of 36 scores, approxi- 
mated a random relationship to the criteria. 
These tests were the CPI, the VPI, the FPT, 
and the SIT. By the same token, the ACL 
bore little relationship to the criteria when 
scored with the original keys developed by 
Gough, but it did relate significantly when 
scored according to Heilbrun’s keys (Gough 
& Heilbrun, 1962). The CMT also proved 
significant, but the biographical data scales in 
BIRST were the best predictors. 

The BIRST consists of two forms, P and S 
(which stand for Placement and Selection, 
respectively). Form P contains 40 multiple- 
choice items, and Form S 56 items, which 
were originally drawn from a pool of 484 
items. For the most part, these items deal 
with factual or biographical data (e.g., “What 
is your present marital status? 1. Single. 
2. Married, no children. 3. Married, one or 
more children. 4. Widowed. 5. Separated or 
divorced.”), although some are concerned 
with present interests, values, and attitudes 
(e.g., “Which one of the following goals would 
you most like to reach during the next five 
years? 1. Earn a large amount of money. 
2. Become an executive. 3. Develop new 
ideas.and inventions. 4. Be in a position where 
you can be free to work on ideas that interest 
you.”). Form P is scored according to two 
scales, and Form S three scales, empirically 
derived and cross-validated in previous re- 
search (Albright & Glennon, 1961; Morrison, 
Owens, Glennon, & Albright, 1962; Smith, 
Albright, Glennon, & Owens, 1961). Briefly, 
the Research Personnel scale consists of items 
which differentiated scientists wishing to 


8The author is indebted to Robert C. Nichols 
for his valuable advice and assistance in research 
design and data processing. 
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remain in research from those scientists who 
aspire to administrative responsibilities—the 
Administrative Personnel scale. The Patent 
Disclosure, Performance Rating, and Creativ- 
ity Rating scales were originally developed 
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from items which discriminated on these 
three criteria of job success. Of these five 
scales, the Creativity Rating key correlated 
highest with the criteria employed in the 
research reported in this paper. 


TABLE 1 


CORRELATIONS BETWEEN PREDICTORS AND CRITERIA 








Predictor 











California Psychological Inventory 


Dominance 

Capacity for Status 

Sociability 

Social Presence 

Self-Acceptance 

Sense of Well-Being 
Responsibility 

Socialization 

Self-Control 

Tolerance 

Good Impression 

Communality 

Achievement via Conformance 
Achievement via Independence 
Intellectual Efficiency 
Psychological Mindedness 
Flexibility 

Femininity 

Originality, Total (Gough, 1957) 
Value Orientation (Nichols & Schnell, 1963) 
Person Orientation (Nichols & Schnell, 1963) 


Vocational Preference Inventory 


Realistic 
Intellectual 
Social 
Conventional 
Enterprising 
Artistic 
Control 
Aggressive 
Masculine 
Status 
Infrequency 


Welsh Figure Preference Test 


Original Art Scale 
Revised Art Scale 
Conformity 


Social Insight Test 


Chapin weights 


Criterion 
Supervisor Peer 
rating rating 
(Buel (Taylor Tripartite 
scale) scale) grouping 
—.07 AD lit 
05 ali) 16 
—.13 10 .09 
—.13 —.01 — .03 
—.23 — .02 07 
07 —.07 —.01 
14 05 05 
— .04 — .03 —.13 
10 — .03 —.07 
.03 .08 .09 
—.09 —.08 —.12 
—.19 —.15 —.18 
00 —.17 —.12 
eile 14 19 
09 — 19 04 
10 —.13 —.01 
.08 — 04 01 
14 04 O1 
—.04 —.01 04 
.00 — .06 — .04 
—.14 .06 li 
—.17 02 —.05 
—.02 20 27s 
—.15 — .06 00 
—.18 —.08 —.01 
—.19 —.01 02 
—.09 m7) alli! 
Boils —.10 — .04 
—.15 —.03 00 
.06 —.13 —.08 
10 .07 12 
ale Wy 08 
01 —.02 .09 
mit 07 20 
—.02 — .28* — 23 
ro 14 mi 
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Table 1—Continued 





Predictor 











Adjective Check List—Gough’s scales 


Most Favorable 
Least Favorable 
Self-Confidence 
Self-Control 
Lability 

Personal Adjustment 


Adjective Check List—Heilbrun’s scales 


Achievement 
Deference 
Order 
Exhibition 
Autonomy 
Affiliation 
Intraception 
Succorance 
Dominance 
Abasement 
Nurturance 
Change 
Endurance 
Heterosexual 
Aggression 


Concept Mastery Test 


Synonyms and Antonyms 
Analogies 
Total Score 


Biographical Information for Research and 
Scientific Talent—Form P 


Research Personnel 
Administrative Personnel 


Biographical Information for Research and 
Scientific Talent—Form $ 


Patent Disclosures 
Performance Rating 
Creativity Rating 


Criterion 
Supervisor Peer 
rating rating 
(Buel (Taylor Tripartite 
scale) scale) grouping 
—.10 10 .04 
— .06 aud, AS 
—.10 .22 23 
.20 —.03 —.08 
—,12 .07 .02 
02 —.13 —.17 
.10 2 .10 
04 — .26* — .37** 
PD .06 .10 
—.21 .16 .23 
—.05 34* 13 (isis 
—.23 —.05 —.13 
.06 .08 07 
—.09 —.07 —.13 
.06 21 vain 
07 —.21 — .30* 
—.15 —.08 —.22 
—.22 .05 .06 
P23 .08 07 
—.31* —.15 —.24 
.02 m5 .29* 
BLS Pi: ea 
nl Phe .16 
als) .28* .20 
14 aL .19 
—.11 —.27* —.15 
123 20% 23 
ors gD, soo ue 
25 sheet A3** 





Note.—N = 58. 
*p <.05. 
<< O1. 

orrelations between Criteria and Personal 

ata 


Table 2 presents the correlations between 
ie three criteria of creativity and five per- 
mal data items. While education, seniority, 
nd the extent of supervisory responsibility 
ere not related to the criteria, peers did 


rate (Taylor scale) their older and better 
paid associates as significantly more creative. 


Intercorrelations among Criteria 


Table 3 presents the intercorrelations 
among the three criteria. As expected, the 
tripartite grouping, which is essentially de- 
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TABLE 2 


CORRELATIONS BETWEEN PERSONAL 
DATA AND CRITERIA 








Criterion 
Super- 
visor Peer 
rating rating : 
(Buel (Taylor Tripartite 
Personal data scale) scale) grouping 
Age Ss} .29* -20 
Education —.05 mk .06 
Seniority —.17 us —.06 
Supervisory 
responsibility —.02 .05 105 
Compensation nus aie: it 





Note.—N = 58. 
*p <.05. 


rived from a combination of the supervisor 
(Buel) and peer (Taylor) rating scales, cor- 
relates more highly with either of these scales 
than they do with each other. The correlation 
of .30 between Buel and Taylor suggests that 
supervisors and peers agree only minimally 
in their estimates of creativity, or that these 
instruments measure different characteristics, 
in addition to the possibility of low reliability 
of ratings. 


DISCUSSION 


Predicting the creative performance of 
individual engineers remains hazardous. The 
correlations obtained in this study between 
paper-and-pencil tests and the criteria of 
creativity were so low as to be virtually 
useless for predictive purposes; biographical 
data, on the other hand, proved to be sig- 
nificant as predictors of both supervisor and 
peer ratings of creativity. This finding, of 
course, is quite consistent with the practical 
dictum that the best predictor of future per- 
formance is past performance; in this case, 


TABLE 3 


INTERCORRELATIONS AMONG CRITERIA 








Criterion 2 3 
i Supervisor rating (Buel scale) .30* .61** 
2. Peer rating (Taylor scale) ati fake 


3. Tripartite grouping 


the best predictor of creative achievement is 
some indication of creative performance in 
the past. The predictive power of biographical 
data has also been reported by previous in- 
vestigators in their research on creativity 
(Holland, 1961; Taylor & Ellison, 1962). 

In addition to underlining the pragmatic 
value of biographical data, the results of this 
research point up an interesting hypothesis, 
which may be stated as follows: ego strength 
is a critical correlate of creativity. The data 
which suggest this hypothesis are found in 
the relationship between the ACL and the 
ratings of creativity. Those engineers who 
rated themselves high on Autonomy and Ag- 
gression and Dominance, and low on Defer- 
ence and Abasement, were considered more 
creative than their fellow engineers. While 
this finding is consistent with the popular 
belief that the creative individual is a strong 
and independent person, further research will 
be needed to investigate this hypothesis and 
determine its predictive value for selecting 
creative engineers. 
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PERSONAL ADJUSTMENT AND PREDICTION OF 


ACADEMIC ACHIEVEMENT 


DONIVAN J. WATLEY 


University of Minnesota 


This study was designed to investigate the relationship between personal ad- 
justment and predictability of academic achievement in a business college. 
The hypothesis tested was that “better” adjusted students would be more 
predictable than maladjusted students. Predictability was determined by cor- 
relation coefficients between aptitude test (CEEB-M and CEEB-V) scores and 
both 1st-quarter and ist-year grades. The sample consisted of 188 freshmen 
male business students who were classified into “positive-,” “average-,” and 
“negative-” adjustment groups on the basis of means of the 10 Guilford- 
Zimmerman Temperament Survey (GZTS) trait raw scores. Comparisons of 
the adjustment groups on correlations between the mathematics and verbal 
scores and grade averages indicated that the adjustment groups did not differ 
in terms of academic predictability. Analysis of differences between the groups 
on both high school achievement and college achievement revealed, however, 
that the positive-adjustment group earned significantly higher grades than the 
negative group. These results indicated that although the adjustment groups 
did not appear to be significantly different in terms of academic predictability, 
a definite relationship did exist between the groups on levels of achievement. 


The extent to which personal adjustment 
influences academic achievement in college is 
a problem of considerable importance. In 
studying this problem the general hypothesis 
has been that performance in college is more 
predictable for the emotionally stable or 
“better” adjusted students, whereas college 
achievement is more unpredictable for stu- 
dents who are maladjusted. Maladjustment 
expresses itself in many ways and can lead 
to overcompensation on studies or complete 
neglect of studies depending on the situation 
and the personality dynamics of the student 
involved. Such behavior results in “fooling” 
predictions made on the basis of academic 
ability tests. Better adjusted students are 
more likely to perform academically in 
accordance with their capabilities. 

Conflicting results related to this hypothesis 
have been presented by Hoyt and Norman 
(1954), and Anderson and Spencer (1963). 
Using the Minnesota Multiphasic Personality 
Inventory (MMPI), Hoyt and Norman in- 
vestigated 328 freshmen males in the Arts 
College of the University of Minnesota in the 
fall quarters of 1951 and 1952. Degree of 
personal adjustment was determined by 
height of T scores on the clinical scales of 
the MMPI, and students were placed in either 
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a “normal,” “one-peak,” or “maladjusted” 
group on the basis of their scale scores. 
Academic predictability was determined from 
correlation coefficients between first-quarter 
grades and two aptitude tests for each of the 
three groups. The normal group was found 
to be more predictable than the maladjusted 
group from correlations between the Ohio 
Psychological Examination (Form 22) and 
grades, but the correlations between the 
American Council on Education (ACE) 
Psychological Examination and grades did not 
support the original hypothesis. 

Anderson and Spencer (1963) repeated the 
Hoyt and Norman (1954) design using larger 
Ns in the adjustment groups and found con- 
flicting results. In addition to investigating 
Arts College men, Anderson and Spencer 
also studied groups of Arts College women 
and Institute of Technology freshmen males 
at the University of Minnesota. From correla- 
tions found between a number of academic 
predictors and grades, they concluded that 
students “normally” adjusted are not more 
predictable than maladjusted students. 

The purpose of this study was to investi- 
gate further the relationship between personal 
adjustment and academic predictability using 
a different experimental design. Whereas the 
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two previous studies used height of T scores 
on the MMPI in assessing degree of personal 
adjustment, the Guilford-Zimmerman Tem- 
perament Survey (GZTS) was used in the 
present study. It is particularly important 
to recognize certain differences between the 
MMPI and the GZTS in personality assess- 
ment. The MMPI consists of scales empiri- 
cally developed on groups with differentially 
diagnosed psychiatric disorders, and the GZTS 
emphasizes traits that have been identified 
statistically in obtaining a comprehensive 
picture of factors that describe personality. 
Personal adjustment in this study, then, is 
derived from a set of personality qualities 
described as “positive” or “negative” rather 
than on degree of disturbance as indicated by 
the MMPI. 

The research design of this study called for 
analyzing academic predictability of freshman 
grades for the year as well as first-quarter 
grades. Hoyt and Norman (1954) used first- 
quarter grades with their sample of arts col- 
lege freshmen, while Anderson and Spencer 
(1963) used first-year grades with arts 
college freshmen. Conceivably this could have 
led to their conflicting results. In addition, 
it was feit that it might be worthwhile to 
study the stability of grades over the period 
of a year for students classified on the basis 
of personal adjustment. 


METHOD 
Subjects 


The subjects (Ss) were freshmen men who entered 
the College of Business Administration at the Uni- 
versity of Denver in the fall quarters of 1958 and 
1959. They carried a minimum of 10 quarter hours 
of course work for three consecutive quarters after 
admission to the University and were between the 
ages of 17 through 23 years of age. The sample 
Was composed of 92 males who entered in 1958 
and 96 entering in 1959, making a total of 188. 
This sample included about 93% of all the freshmen 
males meeting the above criteria. Incomplete test 
information was the reason some were excluded. 


Procedure 


Personal adjustment was assessed from the results of 
the 10-trait raw scores included on the GZTS (1949). 
High scores on the traits indicate positive personality 
qualities and low trait scores suggest negative quali- 
ties of personality. With possible trait scores con- 
sidered on a continuum from positive to negative, 
three groups of Ss were classified on the basis of 


each person’s mean score of the 10 traits included 
on the GZTS. In establishing the three groups, the 
188 mean GZTS trait scores were ranked in order 
from high to low. The 63 students obtaining the 
highest GZTS mean scores made up the positive- 
adjustment group, the 63 students falling next in the 
rank order composed the “average”-adjustment 
group, and the low 62 was called the negative- 
adjustment group. The average of the 10 GZTS trait 
scores was used in classifying personal adjustment 
because no single trait or combination of several 
traits could be labeled “adjustment” to the exclusion 
of the other trait scores. Although it is recognized 
that there are a number of serious criticisms that 
can be leveled against this type of adjustment 
definition, it was felt that for experimental purposes 
such classifications could be permitted. The fact that 
a wide range of mean GZTS scores was obtained 
also appeared to make these classifications justifiable. 
The means of the GZTS mean raw scores for the 
three adjustment groups were: positive-adjustment 
group 20.24, average-adjustment group 17.55, and 
the negative-adjustment group 14.18. Their standard 
deviations were 1.31, .69, and 1.65, respectively. 

The aptitude tests used as predictor variables 
were the mathematics and verbal sections of the 
Scholastic Aptitude Test administered by the College 
Entrance Examination Board (CEEB), which were 
obtained prior to students’ admission to the Business 
College. An additional predictor variable was high 
school achievement based on percentile rank in high 
school graduating class (HSR). The criterion for 
academic performance was grade-point average 
(GPA) computed separately for first-quarter grades 
and for grades earned for the entire year. 


RESULTS AND D1IscussION 


Table 1 lists the correlation coefficients 
computed between the predictor variables and 
both first-quarter and first-year grade aver- 
ages for the three adjustment groups. Inspec- 
tion of this table indicates that the r’s ob- 
tained between the three predictors remained 
quite similar from first-quarter to first-year 
GPAs for all three groups, with the difference 
being as much as .10 correlation points in 
only three cases. Of the three predictors, high 
school rank (HSR) correlated highest with 
both first-quarter and first-year grades for 
the positive-adjustment groups, and HSR and 
CEEB-M correlated about equally with both 
sets of GPAs for the other two adjustment 
groups. 

A primary concern of this investigation was 
the differential academic predictability of the 
adjustment groups. This was determined by 
comparing the correlations for the three 
groups given in Table 1. Snedecor’s (1946) 
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TABLE 1 


CORRELATION COEFFICIENTS BETWEEN PREDICTOR VARIABLES AND GRADES 


FoR 1958 anp 1959 BusINEss STUDENTS 











CEEB-V CEEB-M HSR 
First- First- First- 
quarter Year quarter Year quarter Year 
Group GPA GPA GPA GPA GPA GPA 
Positive 416 368 450 349 929 .640 
Average 281 336 421 426 367 A39 
Negative poli .209 .500 446 462 


402 





chi-square test for testing the hypothesis that 
several 7’s do not differ significantly and 
Fisher’s z-transformation test for comparing 
two 7’s were used in determining whether 
there were differences in group predictability. 
When the 7’s for each predictor variable were 
compared across the adjustment groups no 
significant chi-square was found for either 
first-quarter or first-year GPAs, indicating 
that the hypothesis was accepted in every 
case that the 7’s came from the same popula- 
tion. Comparison of 7’s for any two of the 
adjustment groups using the z-transformation 
test also failed to find any differences that 
were statistically significant for any of the 
predictor variables. Again this was done sepa- 
rately for first-quarter and first-year GPAs. 

These results are in agreement with the 
findings of Anderson and Spencer (1963) in 
suggesting that students grouped on defini- 
tions of personal adjustment do not differ in 


terms of academic predictability. That is, 
in the present study, the positive group was 
not more predictable than either the average 
or the negative groups on any of the varia- 
bles investigated. It should be noted, however, 
that correlation differences between the posi- 
tive and negative groups were in the hypothe- 
sized direction for HSR and CEEB-V, but 
were in the opposite direction for CEEB-M. 

In interpreting these results a crucial point 
that has to be considered is the fact that 
only business students that completed the 
freshman year were included, and only those 
who carried at least 10 credits for three 
consecutive quarters. This excluded a large 
number (V = 160) who transferred, dropped 
out for various reasons, or carried less than 
the minimum credit load. The fact that the 
students included in the sample were fairly 
homogeneous probably did serve to reduce the 
size of the correlation coefficients, but the 


TABLE 2 


MEANS, STANDARD DEVIATIONS, AND TESTS OF SIGNIFICANCE 
AMONG THE THREE ADJUSTMENT GROUPS 























Positive* Average* Negative 
Variables M SD M SD M SD F 
HSR 63.98 23.81 62.68 21.46 53.55 24.22 3.68* 
CEEB-V 445.84 82.20 455.54 85.05 445.13 81.77 123 
CEEB-M 505.33 84.33 505.32 83.16 480.45 83.65 1.09 
First-quarter GPA 2.56 offil 2.70 .56 2.41 .65 Sais 
First-year GPA 2.69 96 2.59 BD B32; 58 6.717% 





Nobis esac: computed as: A = 4, B = 3, C =2,D =1, F =0. 
a Ns 
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extent to which the 7’s were influenced, and 
in which direction, is uncertain. 

A simple analysis of variance was computed 
to test the significance of the differences 
between HSR means for the three adjustment 
groups. The F obtained (3.68) was significant 
at the .05 level (Table 2). The critical ratio 
between the HSR means for the positive- and 
negative-adjustment groups was significant at 
the .02 level, and the critical ratio between 
the average and negative HSR means was sig- 
nificant at the .05 level. The HSR means fell 
in order for the three adjustment groups, 
with the positive groups having the highest 
HSR mean and the negative group the lowest. 
These results suggested that although the 
adjustment groups were not different in terms 
of academic predictability, degree of adjust- 
ment was significantly related to grades 
achieved in high school. 

A separate analysis of variance for each 
of the aptitude tests indicated that the mean 
differences between the three adjustment 
groups were not statistically significant 
(Table 2). These findings lend further sup- 
port to the results found in connection with 
high school achievement in that the differ- 
ences apparently were not due to differences 
in mathematics and verbal aptitudes (as 
measured by CEEB-M and CEEB-V) 
among the adjustment groups. 

Further analysis of grades obtained in 
college by the three adjustment groups also 
sroved interesting. Table 2 shows that the 
Zroups differed significantly when a simple 
analysis of variance was computed for both 
irst-quarter and first-year grade averages. For 
irst-quarter grades the F was 3.17, which is 
significant at the .05 level, and for first- 
year grades the F (6.17) was significant at 
he .01 level. 

More insight is obtained by looking at the 
irst-quarter and first-year GPA means for 
he adjustment groups (Table 2). The aver- 
ge group actually earned the highest mean 
irst-quarter GPA of the three groups, with 
he negative group mean being lowest. For 
irst-year grade averages, however, the posi- 
ive mean GPA increased to 2.69, the highest 
or any of the groups, whereas first-year mean 
xPAs decreased for both the average and 
iegative groups. The negative mean first-year 


TABLE 3 


CriticAL RAtIos FOR Frirst-QUARTER AND First- 
YEAR GRADE AVERAGES AMONG THE THREE 
ADJUSTMENT GROUPS 








Critical ratio 








First- First- 
quarter year 
Adjustment groups GPA GPA 
Positive versus average 123 99 
Positive versus negative 123) 3:035e 
Average versus negative ZO /aaueme 2 OS 
#2 p< 01, 
*** D < 001. 


GPA still remained the lowest of the three 
adjustment groups. Table 3 shows that both 
the positive and average groups obtained 
significantly higher first-year mean GPAs 
than the negative group. It is also noteworthy 
that for first-quarter grades the positive and 
negative groups did not differ significantly, 
but were highly different (.001 level) for 
first-year grades. 

These results suggest that adjustment (as 
here defined) was not only significantly re- 
lated to academic achievement in high school 
but to achievement in business college as well. 
That is, whatever aspects of personality that 
contributed to getting better grades in high 
school also contributed to getting better 
grades in college. Whether this difference in 
achievement resulted from the positive group 
being better equipped emotionally than the 
negative group to handle the stresses of an 
academic situation or from other factors (such 
as stronger needs toward social conformity 
and fulfilling certain role expectations) is un- 
certain. These results indicate only that a 
relationship apparently exists between adjust- 
ment as defined in this study and level of 
achievement. 
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THE DEFINITION AND MEASUREMENT OF 
JOB INVOLVEMENT * 


THOMAS M. LODAHL anp MATHILDE KEJNER 


Graduate School of Business and Public Administration, Cornell University 


Job involvement is the degree to which a person is identified psychologically 
with his work, or the importance of work in his total self-image. Very little is 
presently known about this class of job attitudes, although speculations about 
it are implicit in much of the work on industrial motivation, especially that 
which deals with “participation.” The purpose of the present research was to 
define job involvement, develop a scale for measuring it, gather evidence on 
the reliability and validity of the scale, and to learn something about the 
nature of job involvement through its correlation with other job attitudes. This 
paper describes the development and validation of a scale measuring job 
involvement; the resulting scales are presented, and the relation between job 
involvement and other job attitudes is discussed. 


The process of ego involvement in work 
has been a concern of both psychologists, 
such as McGregor (1944) and Allport (1947), 
and sociologists, such as Hughes (1958) and 
Dubin (1958, 1961). The psychologists have 
tended to focus on organizational conditions 
that lead to job involvement: such as mean- 
ingfulness of work, adequacy of supervision, 
etc. The sociologists have been more con- 
cerned with aspects of the socialization process 
that lead to the incorporation in the person 
of work-relevant norms and values. Dubin, 
for instance, holds that psychological (i.e., 
derived-drive) theories of motivation are not 
adequate to explain organizational behavior 
because they do not account for wide varia- 
tions in modes of drive satisfaction, and 
modes of drive reduction can be changed. In 
order to account for the ways in which mo- 
tivation is channeled, we must turn to social 
norms and values which determine (and in 
the long run are changed by) specific modes 
of behavior. The social structure, then, chan- 
nels and sustains motivation in specific ways. 


When a person internalizes a value, norm, goal, or 
behavior pattern, these become guides for future 
activity. Internalization means acceptance into the 
personal behavior systems, and ways of thinking. It 
means, literally, putting inside the social personality, 
modes of activities and thoughtways so they be- 
come, in the future, the basis for behavior and 
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thought. These activities and thoughtways, in turn, 
have their origins, for any given person, in social 
experience [Dubin, 1961, pp. 51-52]. 


Job involvement is the internalization of 
values about the goodness of work or the 
importance of work in the worth of the person, 
and perhaps it thus measures the ease with 
which the person can-be further socialized by 
an organization. Dubin (1961) goes on to 
point out that, 


In the work organization the adult learns the mo- 
tivation system that is specific to that institutional 
setting. There is real continuity between childhood 
experiences in the society and adult experiences in 
the work organization. The work organization builds 
its motivational systems on societal foundations. 
What happens at work, however, is that these social 
motivation patterns are made more specific. They 
are also made more appropriate to the work per- 
formed [p. 53]. 


The relevance of social norms and values 
to the understanding of industrial motivation 
is clear, but they have largely been ignored 
in the study of job attitudes until now. Much 
research needs to be done on the ways in 
which social-system variables influence and 
channel individual motivation in organiza- 


tions before a fuller understanding of organ- 


izational behavior can be reached. The present 


study was designed as a step in that direction 
—to provide one instrument for such research. 


The substantive results obtained so far are 
reasonably encouraging. 


Jos INVOLVEMENT as 


Definitions 


For this work, job involvement was de- 
fined as the degree to which a person’s work 
performance affects his self-esteem. Elsewhere 
(Lodahl, 1964) it was hypothesized that its 
main determinant is a value-orientation to- 
ward work that is learned early in the social- 
ization process. In some ways it opera- 
tionalizes the “Protestant ethic” and because 
it is a result of the introjection of certain 
values about work into the self, it is probably 
resistant to changes in the person due to the 
nature of a particular job. 

Others have recognized job involvement 
and called it by other names, but defined the 
concept very similarly. In Allport’s (1947) 
treatment of the psychology of participation, 
ego involvement was defined as the situation 
in which the person “engages the status- 
seeking motive” in his work. (The person is 
of course seeking self-esteem as well as that 
of others.) For French and Kahn (1962), 
the centrality of an ability is the degree to 
which it affects self-esteem; if job perform- 
ance is central to the worker, then we have 
“ego-involved performance.” They remark 
that “this implies that his job performance 
will affect his self-esteem [p. 19].” One of 
Guion’s (1958) definitions of morale is rele- 
vant to job involvement. 


Morale is ego involvement in one’s job... . There 
s something to be said for the attitudinal frame of 
eference in which a man perceives his job to be so 
mportant to himself, to his company, and to 
ociety that his superiors’ “blunders” are not to be 
olerated [p. 60]. 


These definitions have a common core of 
neaning in that they describe the job-involved 
yerson as one for whom work is a very im- 
ortant part of life, and as one who is affected 
ery much personally by his whole job situa- 
ion: the work itself, his co-workers, the 
ompany, etc. On the other hand, the non-job- 
nvolved worker does his living off the job. 
Vork is not as important a part of his psy- 
hological life. His interests are elsewhere, 
nd the core of his self-image, the essential 
art of his identity, is not greatly affected by 
he kind of work he does or how well he 
oes it. It is important to note, with Guion, 
hat the job-involved worker is not necessarily 


happy with his job; in fact, very angry people 
may be just as involved in their jobs as very 
happy ones. 


Previous Research 


The literature on job involvement is sparse. 
Wickert (1951) found that telephone oper- 
ators and service representatives who had 
quit were less ego involved in their work 
than those who were on force: the on-force 
personnel tended to feel that they had a 
chance to make decisions on the job, and 
that their contribution to the success of the 
company was “very important,” “quite im- 
portant,” or of “fair importance.” Of course, 
it is not possible to say when the disinvolve- 
ment of those who quit took place, since the 
questionnaire was administered sometime after 
they left; perhaps they were never involved 
at all or, as seems more likely, they disin- 
volved themselves after leaving the company. 

In a series of careful laboratory experi- 
ments, Lewis (1944) and Lewis and Franklin 
(1944) used the Zeigarnik effect to establish 
conditions under which ego involvement in 
work took place. To summarize, they found 
that people do become ego involved in work, 
even in laboratory tasks; that under “ego- 
involving” instructions, recall favors the com- 
pleted (i.e., successful) tasks; and that people 
working in a group of interdependent tasks 
show the same tension systems as those work- 
ing alone, i.e., that people also become ego 
involved in a group task. 

In his study of the “central life interests” 
of workers, Dubin (1955) used a 40-item 
questionnaire to sample total life experiences; 
the form of the questionnaire allowed workers 
to choose a job-oriented, non-job-oriented, or 
an indifferent response. Over all 40 items, 
Dubin found that only 24% of the respond- 
ents could be classed as “job oriented” (i.e., 
those who chose a work-related response on 
at least half the questions or whose answers 
were at least 70% job oriented and indiffer- 
ent). Only 9% of Dubin’s workers found their 
most significant informal group experiences 
in work, 15% gave work as the most common 
source of pleasure and satisfaction, and 61% 
gave job-oriented responses on 7 items dealing 
with formal organization life. Dubin con- 
cludes that it is not surprising that only 24% 
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of workers are job oriented and the rest 
exhibit only ‘adequate’ social behavior, 
given the organizational conditions under 
which most people work. Unfortunately he 
gives no differential data allowing correlation 
of job orientation with skill level, age, etc. 

In previous unpublished research, Lodahl 
used rating methods to determine job involve- 
ment from interview protocols. Data on 21 
job attitudes obtained on women in precision 
electronics assembly work were intercorrelated 
and factor analyzed; job involvement emerged 
as a separate factor, related only to team 
involvement, product knowledge, and time on 
job. Variables dealing with satisfaction, mo- 
tivation, and frustration were factorially in- 
dependent of job involvement. It was also 
found in this study that while interrater 
agreement on job involvement was low, it 
nevertheless appeared to be the most stable 
of the 21 attitude variables over a 20-month 
period. This suggested that job involvement 
was relatively unaffected by changes in the 
work environment, since during the 20-month 
period many “improvements” were made in 
the jobs and in the organization immediately 
surrounding the operators. 

Using the same attitude data and adding 
technological variables, Hearn (1962) found 
that job involvement was related to the per- 
ceptual skill required of these women. He 
also found that team operators were more 
job involved than people working alone, but 
he ascribed this to the greater perceptual skill 
required on teams. Since these variables are 
tied together, it is impossible to untangle the 
causal sequence in this instance. 

The same content-analysis methods were 
used in a study of auto assembly-line workers 
in which Lodahl (1964) again found that 
job involvement emerged as an independent 
attitude factor, this time with the variables 
product involvement, company involvement, 
and number of men working near loaded on 
the involvement factor. Social variables thus 
appear in the factorial composition of job in- 
volvement in both samples, hinting at the 
sociocultural origin of this attitude and under- 
scoring the importance of work groups in 
maintaining stable orientations toward work. 

Summarizing the results of these interview 
studies, job involvement appears to be fac- 


torially independent of other job attitudes, 
relatively stable over time, relatively un- 
affected by changes in the work organization, 
and related to the social nearness of other 
workers (for what reason is not yet clear). 
We cannot be sure how far these conclusions 
will generalize, however. They are based on 
interview material not specifically collected 
for the purpose of studying job involvement, 
and the results of interview studies have been 
known to differ from these employing ques- 
tionnaires (cf. Ash, 1954; Berrien & Angoff, 
1960). For these reasons it seemed desirable 
to construct an attitude scale for measuring 
job involvement, and to relate it to other 
directly measured job attitude scores. 


METHODS AND RESULTS 
Initial Item Selection and Reduction 


The scale discrimination technique of Ed- 
wards and Kilpatrick (1948) was followed in 
constructing the scale. Initially 110 state- 
ments potentially related to job involvement 
were collected from interview protocols, exist- 
ing questionnaires, other researchers,” or were 
merely invented. Elimination of duplications, 
etc., reduced the list to 87 items, which were 
then prepared for submission to judges. The 
judging booklet included a face sheet giving 
definitions and examples of job involvement. 
The instructions to the judges were as 
follows: 


The following items are comments people have made 
or might make about their work. We would like you 
to judge each statement as to the degree of job 
involvement it expresses by circling the appropriate — 
number below each one. On this scale, a “1” rep- — 
resents a very low degree of job involvement, and — 
“11” represents very high job involvement, and “6” 
represents a medium degree of job involvement. ; 


The items were then submitted to “expert” | 
judges: 11 psychologists, 3 sociologists, and 
8 second-year graduate students in a course in 
human relations. Means, medians, standard — 
deviations, and Q values of their ratings were — 
calculated for each of the 87 items. Forty- 
seven items were discarded using these statis- . 
tics: the 40 items retained had low Q values 


2The authors wish especially to thank Lawrence 
K. Williams for his help in constructing items. ' 
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and tended to have medians more toward the 
ends of the distribution. 

A Likert-type item analysis was then per- 
formed. The 40 items were cast into the 
Likert format, with four categories of response 
(strongly agree, agree, disagree, strongly dis- 
agree; scored 1, 2, 3, and 4, respectively). 
The items were put in random order and 
administered to 137 nursing personnel (the 
entire staff except for those on leave, etc.) 
in a large general hospital. Total scores 
summed over the 40 items were obtained for 
each person, and the data from the 40 items 
plus the total job-involvement score were 
intercorrelated and factor analyzed.* Product- 
moment correlation coefficients were factored 
by the method of principal axes, using unities 
in the diagonal,® and the results were rotated 
ising Kaiser’s varimax criterion (see Harman, 
1960). 

A general factor accounting for 22% of 
he obtained communality emerged in the 
inrotated solution. The total job-involvement 
core had a loading of .96 on this general 
actor, accounting for about 91% of its 
variance. Altogether, 11 factors with eigen- 
values over 1.00 were obtained, but only the 
irst 7 of these had loadings greater than .30 
m more than two variables. These 7 ac- 
ounted for 77% of the obtained com- 
nunality. Accordingly, the first 7 factors were 
otated separately, as well as all 11. The 7- 
actor rotation was somewhat easier to inter- 
ret, and the results from it will be sum- 
narized here. 

In the following lists of variables, a positive 
oading means agreement with the item.® The 
tems with the four highest loadings will be 
resented for the first five factors (the sixth 


® The authors wish to thank Ruth Anderson, who 
ollected these data as part of a larger study of 
ursing attitudes. 

4 All computations were carried out at the Corneil 
‘omputing Center. 

5 At the time of the computations the only avail- 
ble program for principal components factor analysis 
lowed only unities in the diagonal; thus the com- 
wunality estimates are probably inflated. 

6 Although the scale was scored so that a low 
ore indicated high involvement, signs of all cor- 
lation coefficients are here reversed to simplify in- 
rpretation. The means of Table 2, however, are 
resented as originally scored. 


and seventh factors have substantially zero 
loadings on total job-involvement score). 


Factor 1 
71 I used to care a lot about my work, but now 
other things are more important to me. 


-64 I used to be more ambitious about my work 
than I am now. 


57 I avoid taking on extra duties and responsi- 
bilities in my work. 


50 Quite often I feel like staying home from 
work instead of coming in. 


These items (and the others on the factor) 
have a hopeless quality, as if the person who 
endorses them has given up caring much about 
work. They remind us of the “indifferent” 
response to work described by Dubin (1955) 
and Presthus (1962). This response might be 
made by a person who originally had great 
expectations about work; when these are 
blunted, the first reaction is alienation which 
then hardens into indifference and work be- 
comes a mere instrumentality for other 
pleasures. 


Factor 2 
—.62 Sometimes I lie awake at night thinking 

ahead to the next day’s work. 

The most important things that happen to 

me involve my work. 

I live, eat, and breathe my job. 


I feel depressed when I fail at something 
connected with my job. 


Sh) 


5 
—.54 


These items all express very high job involve- 
ment, perhaps higher than the individual is 
normally permitted to express in our culture. 
Yet they are extremely important in discrimi- 
nating among degrees of job involvement: 
total job-involvement score was loaded highest 
on this factor in the rotated matrix. 


Factor 3 

67 Tl stay overtime to finish a job, even if 
I’m not paid for it. 

63 For me, mornings at work really fly by. 

—.58 How well I work does not affect the way I 

feel about myself. 

.54 Sometimes I’d like to kick myself for the 
mistakes I make in my work. 


These items have high face validity for the 
concept of job involvement. They clearly ex- 
press high involvement and a high sense of 
duty toward work. 
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Factor 4 
—.68 I usually show up for work a little early, to 

get things ready. 

58 I’m late for work pretty often. 

43 Tm almost sure to think about unfinished 
work problems at home. 

42 Quite often I feel like staying home from 
work instead of coming in. 


These items seem to deal with tendencies to 
avoid coming to work and with guilt over un- 
finished work. They seem like a negative kind 
of involvement: feeling badly about poor per- 
formance on the job, one way of caring 
about it. 


Factor 5 


73 I enjoy discussing my work with people out- 
side the company. 

72 I like to talk about my work with my friends. 

49 I prefer a job where I can put my own ideas 
to work. 

44 I would like a chance to make important 
decisions on my job. 


This factor seems to deal with pride in the 
organization, general ambition, and upward- 
mobility desires. It seems to come fairly close 
to the notion of “participation” as advanced 
by Allport (1947) and Wickert’s (1951) 
definition of “ego involvement.” 

The loadings of total job-involvement score 
for Factors 1-5 respectively were: 1, —.43; 
2, —.98; 3, 38; 4, —.37, and 5, .36. Together, 
these loadings account for 92% of the vari- 
ance in the total job-involvement score. Con- 
sidering that the sample of items is fairly 
broad and that all but 8% of the variance in 
total involvement score is accounted for, these 
could be considered “dimensions” of job in- 
volvement for nursing personnel. The question 
then arises as to the generality of these dimen- 
sions in different populations. 


Further Item Reduction and Cross-Validation 


The set of items was reduced to 20 by con- 
sidering the item-total correlations, the com- 
munality of an item, and the factorial clarity 
of the item. At the conclusion of this process, 
the 20 items included 6 each from Factors 
1 and 2, 5 from 3, and 3 from Factor 4. 
(None were included from Factor 5 because 
these items had substantially lower item-total 
correlations. ) 


The items were then reordered and admin- 
istered to a group of engineers working in an 
advanced development laboratory“ as part of 
a larger attitude questionnaire which was 
distributed by the company and was returned 
by mail to the authors by each individual. 
Response to the survey was 69%; control 
data available on the entire population of 
engineers indicated minor distortion in the 
returns in the direction of greater participa- 
tion from those in higher position levels and 
from older men. Chi-square tests showed that 
the distortions were not significant, however. 

In order to compare the engineers’ and 
nurses’ data, the final 20 items were rescored 
for the nurses. The correlation between the 
original 40-item total and the final 20-item 
total was .88; since this is an uncorrected 
part-whole correlation, it indicates a fair 
amount of loss in the item reduction. Data 
from the 20 items plus the 20-item total job- 
involvement score for both the nurses and 
the engineers were intercorrelated and factor 
analyzed, using the same procedures as above. 
Examination of the correlation matrices re- 
vealed low interitem correlations (averaging 
about .17) and relatively high item-total cor- 
relations. The result of this was that, in both 
analyses, most of the variance in total job- 
involvement score appeared on the first (un- 
rotated) principal axis. For the nurses, the 
loading of the total score on the first factor 
was .99, and for the engineers, .96. These 
loadings indicate the presence of a general 
job-involvement factor over the 20 items, 
which is, of course, to be expected. For the 
nurses, however, only & of the items had their 
highest loading on this general factor, and for 
the engineers, only 11. Six of the items had 
highest loadings on the first principal axis 
in both’ samples, and these were formed into 
a short version of the scale, described below. 

Since other items had shown substantial 
item-total correlations, however, the rotated 
factor matrices were examined. In _ both 
samples four factors were extracted and 
rotated to the varimax criterion. For the | 
nurses, three interpretable factors emerged 


“The authors wish to thank the respondents, who ; 
participated in the survey on their own time, and to 
thank Frank Overstrom for his help in coordinating — 
the project. 
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TABLE 1 
FACTORIAL STRUCTURE OF 20-ITEM SCALE 
Factors 
Nurses* Engineers> 
Item 1 2 3 1 2 3 + 
1. I'll stay overtime to finish a job, 
even if I’m not paid for it. —04 —04 64 —23 —21 —4] —34 
2. You can measure a person pretty 
well by how good a job he does. —06 00 56 —20 15 —65 —11 
3. The major satisfaction in my life 
comes from my job. —52 14 31 —81 —12 —10 00 
4. For me, mornings at work really fly 
by. —02 15 61 —13 —(2 id —62 
5. I usually show up for work a little 
early, to get things ready. —58 17 ell —28 04 39 —28 
6. The most important things that 
happen to me involve my work. OS 13 11 —8&3 —19 06 Sali 
7. Sometimes I lie awake at night think- 
ing ahead to the next day’s work. —02 —18 —28 —13 00 —09 —58 
8. I’m really a perfectionist about my 
work. —52 03 14 —40 —02 —32 —12 
9. I feel depressed when I fail at some- 
thing connected with my job. —47 —34 07 17 14 12 —51 
\0. I have other activities more important 
than my work. 35 —32 23 79 19 —07 —16 
1. I live, eat, and breathe my job. —63 07 11 —77 03 02 =ile 
2. I would probably keep working even 
if I didn’t need the money. —09 06 39 09 —12 —68 00 
.3. Quite often I feel like staying home 
from work instead of coming in. 11 —62 —22 14 64 —30 15 
4. To me, my work is only a small part 
of who I am. 26 22, —16 78 23 12 —07 
5. Iam very much involved personally 
in my work. —51 26 15 —51 —39 —24. —28 
6. I avoid taking on extra duties and 
responsibilities in my work. 04 —61 —04 25 40 56 —04 
7. I used to be more ambitious about 
my work than I am now. 07 —72 09 00 82 22 —06 
8. Most things in life are more important 
than work. 33 —42 02 35 55) 23 —42 
9. T used to care more about my work, 
but now other things are more im- 
portant to me. 12 —71 16 25 70 11 —14 
0. Sometimes Id like to kick myself for 
the mistakes I make in my work. —13 —25 65 —13 —42 36 —48 
1. Total job involvement score (summed 
over all 20 items) —77 48 40 —71 —56 —26 —30 





Note.—Rotated factors; decimal points omitted; 
aN = 137 


’N =70. 


positive loading indicates endorsement of the item. 
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TABLE 2 


SumMARY Data on 20-ItEmM SCALE 








Corrected 
Split- split- 
Norming group M sD) half r half r 
Nurses (N =137) 43.37 6.52 56 aie 
Engineers (N =70) 42.62 7.83 .67 .80 
Students (NV =46) 48.06 9.56 .80 89 





Note.—High score indicates lower involvement. 


(the fourth was a doublet) and for the 
engineers, four. These are presented in 
Table 1. 

The first two factors appear rather similar 
in the two samples, except that the second 
and third factors have opposite patterns of 
signs for the engineers. (Loadings greater 
than .40 were considered in interpreting fac- 
tors.) Factor 1 in these two samples is the 
same as Factor 2 in the analysis of the 40-item 
scale: nonacceptance of items expressing very 
high job involvement. As in the earlier anal- 
ysis, this factor has the highest correlation 
with total job-involvement score. Factor 2 in 
the present analysis is the same as Factor 1 
in the 40-item analysis: the indifferent re- 
sponse to work. For the nurses, the third 
factor is the same as before: duty-bound posi- 
tive job involvement, a kind of arbeitsfreude. 
For the engineers, Factor 3 is opposite in sign 
to Factor 3 for the nurses, but seems to 
deal with the same kind of content: it is a 
rejection of extra duties and of the general 
notion of work as a measure of self (“you can 
measure a person pretty well by how good a 
job he does”). Factor 4 for the engineers 
seems to deal with boredom and the general 
unimportance of work. 

Considering the similarity of factorial 
structure across these two samples it seems 
reasonable to conclude that job involvement, 
as measured by these 20 items, is multi- 
dimensional and probably has at least three 
dimensions. The fact that the 20-item set used 
is multidimensional indicates that it would 
not be advisable to attempt measuring job 
involvement with a single Guttman scale, 
although three might do fairly well. Since the 
aim of this research was a single scale, the 
scalogram analysis was not carried out for 
these data. 
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Reliability and Validity 


Reliability. The split-half reliability of the 
20-item scale was computed by calculating 
product-moment correlation coefficients be- 
tween halves of the scale, using odd-even 
items as the split. Table 2 shows means, 
standard deviations, and split-half correla- 
tions for the nurses, engineers, and for a 
group of second-year graduate business-ad- 
ministration students who were asked to re- 
spond to the items as regards the “job” of 
students. Split-half correlations were corrected 
by means of the Spearman-Brown formula. 
Reliability of the 20-item scale is adequate 
but not extremely high. Possibly there is some 
effect of “hiding” the 20-item scale among the 
40 items for the nurses, but this was not 
operating for the engineers or for the students. 
It must be concluded that job involvement 
is not a very internally consistent attitude but 
perhaps this is reasonable in light of its 
multidimensionality and the low interitem 
correlations. 

In an attempt to shorten the job-involve- 
ment scale for use in space-cramped large 
questionnaires, the 6 items loading highest on 


_the first (unrotated) principal component in 


both samples were rescored as a single scale 
for both the engineers and nurses sample: 
these were items 3, 6, 8, 11, 15, and 18 of 
Table 1. The split-half correlation (based on 
odd- versus even-numbered items) was .57; 
corrected with the Spearman-Brown formula, 
the reliability of the 6-item scale is estimated 
at .73. The correlation between the 6-item 
total and the 20-item total is .87. With about 
76% of the variance in the 20-item total 


accounted for in the 6 items, it would seem — 


reasonable to substitute the 6-item scale where 
space is at a premium. 


Validity. One evidence of validity is the 


degree to which a measure discriminates 
among groups. Analysis of variance performed 


on the data of Table 2 indicate that the 3 


groups differ significantly from each other 
(F = 4.84, p < .01). The students have lower 
job involvement than either the nurses or the 


engineers, who do not differ from each other. } 
The students had just been handed back a 
midterm exam in the class in which data were 
collected, and the instructor had remarked ; 


that his policy was to give low marks in all z 


but the final. Considering this and the fact 
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hat this was their final semester in a graduate 
rofessional school, it is not too surprising to 
ind their involvement as students lower than 
hat of engineers or nurses, who were already 
mbarked on their life careers. 

Another kind of evidence for the validity 
f a scale is its correlation with other (prefer- 
bly well-understood) variables. Four sets of 
lata were available with which to assess the 
elation of job involvement to other variables: 
ursing personnel, head nurses, students, and 
ngineers. 

For the 137 nursing personnel, consisting of 
lead nurses, staff nurses (RNs), practical 
urses, nurse aides, and orderlies, total job- 
avolvement scores were correlated with years 
f college, years of experience, part time or 
ull time, job status (six levels from head 
urse to orderly), age, and marital status. 
‘he only significant correlation was between 
ob-involvement score and age, .26, p< .01. 
‘he older nursing personnel tend to be more 
ob involved. 

Anderson (1964) obtained data on a 
ariety of measures for 25 head nurses in a 
urge general hospital. Among these was an 
activity preference scale” consisting of triads 
f items on activities of the head nurse, which 
veasured the head nurse’s preference for 
ctual nursing care activities (i.e., taking care 
f patients herself), for personnel activities 
such as training her nurses, giving directions, 
tc), and for external coordinating activities 
checking schedules with participating doctors, 
tc.). Anderson also obtained data on the 
hio State Leader Behavior Description Ques- 
onnaire (LBDQ) measuring “consideration” 
nd “initiating structure” factors in head 
urse behavior as seen by her subordinates. 
Ising the 40-item total job-involvement score, 
nderson found that job involvement was 
ssociated negatively with preference for 
ursing care activities, positively with prefer- 
ace for coordinating activities, and negatively 
ith the consideration scale on the LBDQ. It 
ould appear that job involvement is not a 
ait of the “friendly helper” sort of head 
urse, who is highly considerate and most 
joys actual care of patients; rather it is a 
laracteristic of those head nurses who like 
yordinating and administrative activities. 

Ghiselli’s Self-Description Inventory (1954) 
as administered to the student sample 


previously described, and scored on Ghi- 
selli’s empirically constructed scales for 
intelligence, supervisory qualities, initiative, 
self-assurance, occupational level, and de- 
cision-making approach. Using the 20-item 
total job-involvement score, the only sig- 
nificant correlation was between job involve- 
ment and supervisory qualities (the coefficient 
was .31, p < .05). Two others approach sig- 
nificance (p< .10): initiative and intelligence. 
Apparently the job-involved person is the 
sort who makes a good supervisor (as seen 
by his superiors), and who may be higher in 
initiative and intelligence than others. 

The final evidence on validity of the job- 
involvement scale comes from the sample of 
engineers. Fifty variables were scored from 
the survey questionnaire. These variables dealt 
with the technological nature of the work 
itself (its variety, responsibility, etc.), job 
satisfaction as measured by Smith’s new Job 
Description Index (JDI) (Kendall, Smith, 
Hulin, & Locke, 1963), interferences and frus- 
trations in work, perceived technical pro- 
ficiency of self and others, performance as 
measured by data on percentage salary in- 
creases, and demographic data on education 
(self, father, and mother), age, position level, 
marital status, and where the person went to 
high school, among others. These variables 
were intercorrelated and factor analyzed. 

The 20-item job-involvement score was cor- 
related with 2 of the 13 job variables: the 
number of people contacted per day in the 
job (.30) and the interdependence of the job 
(necessity for working closely with others) 
(.34) are both associated with high involve- 
ment, at the .01 level. Four of the five satis- 
faction variables are associated with high job 
involvement: satisfaction with the work itself 
(.29), promotion (.38), supervision (.38), 
and people (.37). High involvement was also 
associated with the perceived technical pro- 
ficiency of the supervisor (.29) and with 
perceived chances of getting two or more 
future promotions (.34). Finally, on a “scale” 
for location of high school attended which 
runs east-south-midwest-west, people from 
east and south are less job-involved than 
others. The correlation coefficient is only .25, 
however, and the respondents themselves 
decided the definitions of location; it would 
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probably be a mistake to attach much import 
to this relationship. 

In the factor analysis, 12 factors were ex- 
tracted from the 50-variable matrix, and 
varimax rotations were performed for 12, 7, 
and 5 factors. Later, the matrix was reduced 
to 36 variables by eliminating some over- 
lapping variables. Again, 12 factors were ex- 
tracted and rotations were done for 12, 7, 
and 5 factors. In every one of these rota- 
tions, the 20-item total job-involvement score 
appeared with its highest loading on the same 
factor as the job-satisfaction variables. Con- 
sidering these results and the zero-order cor- 
relations of job involvement with the satis- 
faction variables, it must be concluded that 
for these engineers, job involvement has 
roughly the same factorial content as job 
satisfaction. In the 12- and 7-factor rotations, 
the second highest loading for job involvement 
was on a factor termed ‘“‘success,”’ which had 
other high loadings on total recent salary 
increases, years in college, and low age. Still, 
these factor loadings account for only about 
30% of the variance in job involvement. It 
would not be justified to conclude from these 
findings that job involvement is the same as 
job satisfaction, although for these engineers 
they appear to have some of the same 
determinants. 


DIscUSSION AND CONCLUSIONS 


The main findings of this study are as 
follows: (a) job involvement is a multi- 
dimensional attitude that can be scaled with 
adequate, but not high reliability; (b) the 
scale items seem to be general over different 
populations, in that roughly the same factorial 
structure appeared in groups of engineers and 
nurses; (c) the scale discriminates among 
groups and has plausible correlations with 
other variables; (d) the 20-item scale de- 
veloped here has about the same factorial con- 
tent as job satisfaction for a group of engi- 
neers. 

It would be premature to attempt to account 
for all of these findings, since obviously more 
data are needed, especially from lower-status 
occupations. However, some speculations on 
the nature of job involvement can be made 
from the data now available. 

Combining these data to make a composite 
profile of the hypothetical job-involved per- 
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son, we find that highly job-involved people 
(a) are older (nursing personnel); (0) are 
less considerate as leaders, and prefer ad- 
ministrative or coordinating activities to nurs- 
ing care activities (head nurses); (c) describe 
themselves the same way as good supervisors 
do, and score higher on initiative and intel- 
ligence (students); (d) have more highly 
interdependent (team-type) jobs, see more 
people during the day, are more satisfied with 
their work itself, their promotional opportuni- 
ties, their supervisor and fellow workers, feel 
their supervisor is technically proficient and 
that they have a good chance of getting two or 
more promotions, and went to high school 
elsewhere than in the East (engineers). A 
good deal of ambition, upward mobility, 
and general social motivation seems to run 
through this description, recalling Allport’s 
(1947) description of the ego-involved person 
as “. . . engaging the status-seeking motive” 
in work. In fact, Allport predicted the present 
results rather well when he wrote that “when 
the individual is busily engaged in using his 
talents, understanding his work, and having 
pleasant social relations with foreman and 
fellow worker, then he is, as the saying goes, 
“Gdentified’ with his job.” Allport goes on 


to point out that the most indispensable — 


condition for this is “friendly, unaffected 
social relations [p. 123].” Clearly, the in- 
volved engineer here feels he has a good 
chance for promotion and likes his job, 
fellow workers, and supervisor; he seems 
“organizationally involved” as well as job 
involved. 

But if job involvement is a readiness to 
be judged by one’s work, imparted during 
the socialization process, then why should job 
and organizational involvement be associated? 
One possible answer is that the work is judged 


by the organization; that is, it sets the norms — 


for what is good work. If the criteria used are 
fuzzy and the person is kept dependent by 
well-meaning but arbitrary judgments from 
superiors about his work, perhaps he needs 
to identify with them (and the organization) 
in order to validate himself. The alternative, 
of course, is to protect himself from this con- 
flict by turning to “professional” standards 


not determined by the organization. Another 


possible answer is that the person who is job 


involved has been successfully socialized for | 
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oday’s bureaucratic society. Most socializa- 
ion processes use social motivations for con- 
rol; that is, nonconformity is discouraged 
yy social rejection, and conformity to norms 
s rewarded with social acceptance. The job- 
nvolved person may have begun with stronger 
ffiliative needs than the maverick who can- 
lot be taught the Apollonian virtues of work, 
nd thus later shows up as the sort of person 
vho needs to form friendly relationships 
nd to identify with his superiors and the 
rganization. 

Whichever is the case, the job-involved 
erson is no lone wolf. In fact, the data from 
ll three groups remind one of the “mana- 
erial personality” as described by Henry 
1949), especially the executive traits of high- 
chievement desire, mobility drive, activity 
nd aggression, and detached relations with 
ubordinates. The job-involved head nurses 
cored low on “consideration,” matching 
lenry’s description of the executive as one 
tho “treats his subordinates in a detached 
nd impersonal way, seeing them as ‘doers of 
ork,’ rather than as people.” Perhaps it is 
easonable that organizations would select 
yb-involved people as executives: few others 
ould be willing to make the personal sacri- 
ces necessary for success in today’s executive 
uite (cf. Warner & Abegglen, 1955). 

A limitation of this study is the rather 
arrow range of occupations sampled; all 
ne above conclusions are thus limited to 
nem. Identification of more specific con- 
omitants of job involvement must await 
irther research. From the present results it 
zems clear that job involvement is affected 
y local organizational conditions (mainly 
cial ones), as well as by value orientations 
arned early in the socialization process. The 
ae of socialization in job involvement re- 
lains unclear because of the failure of social- 
lass data to relate to job involvement. (The 
ymewhat shaky finding that easterners are 
ss involved than others is intriguing, but 
eeds further investigation.) Perhaps the 
evelopment of the scale to measure job 
volvement reported here will facilitate 
rogress in the understanding of some of 
1ese problems. 
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PREDICTIVE VALIDITY OF PSYCHOMETRIC EVALUATIONS 
OF SUPERVISORS * 


CHARLES F. DICKEN? ano JOHN D. BLACK 


Stanford University 


31 higher level employees in 1 firm and 26 in another were assessed by objective 
test batteries. Clinical interpretations of test data, test scores, and other pre- 
dictors were analyzed with reference to criterion personality ratings and 
management decisions at a follow-up point of 32 yrs for the 1st sample and 
7 yrs for the 2nd. Predictive validity of test assessments was generally 
satisfactory in the 1st sample, although not pragmatically superior to that 
of certain objective data. Prediction was less satisfactory in the 2nd sample, 
but more unique to test data. A matching study indicated some correspondence 


of test reports and criterion personality sketches in the 2nd sample. Uninter- 


4 


preted test scores were not generally valid except as measures of intelligence. 


Implications of the sample differences and of the method are discussed. 


The use by industry of psychological evalua- 
tions based on batteries of psychometric 
devices of varying length and complexity 
continues to be widespread despite both re- 
sponsible and irresponsible criticism of the 
activity and in the absence of much serious 
attention to evaluation. 

Clinicians in psychology and psychiatry are 
rarely required to make such long-range pre- 
dictions about complex patterns of human 
behavior as are personnel directors and in- 
dustrial managers, except perhaps in selecting 
partners or hiring a secretary. That industry 
should seek professional help in crucial de- 
cisions which must otherwise be based largely 
on intuition seems perfectly reasonable. 

Less understandable is the fact that so 
many psychologists are willing to engage 
in this enterprise without more precise knowl- 
edge of the validities they achieve. Even 
though many of us would not overestimate 
the probabilities attaching to our predictions, 
as scientists we are committed to the ethic 


1The authors gratefully acknowledge the coopera- 
tion of the two participating companies and the 
assistance of 16 executives and supervisors who to- 
gether contributed nearly 60 hours furnishing ratings 
and other data. We are also indebted to those 
professional colleagues who served as raters, judges, 
and consultants, particularly Julia S. Ferris, who 
handled most of the tabulating and computing. This 
research was assisted by National Science Foundation 
Grant NSF-6P 948 to the Stanford Computation 
Center. 

2 Now at San Diego State College. 


| 
that we should know them as precisely as 
possible. 

The present study is an attempt to explore - 
the validity of clinical interpretations of an 
objective test battery in two different indus- a 
trial settings. Narrative reports based on test | 
data were available in the files of the Stanford 
Counseling and Testing Center for several 
hundred applicants and employees of more — 
than a score of companies. The reports pre-— 
pared for two companies were finally chosen : 
for follow-up for five reasons: 

1. The testing had been done on already- 
employed persons so there was no loss of 
subjects (Ss) as there would be if applicants 
were used. 

2. The Ss for each company had been tested 
at the same time so follow-up periods for in- 
dividuals were identical within each company. 

3. Both companies expanded rapidly sub- 
sequent to the testing, providing more than 
average opportunity for change in status 
among employees. 

4. The companies were willing to cooperate. | 

5. There were reasons to feel that the prob- 
lem of criterion contamination would be 
minimized in these concerns. This problem _ 
will be discussed more fully later. 

One of the firms in the study is a manu- 
facturer of klystron tubes. The Ss_ tested 
were mostly first-line supervisors considered 
by their superiors as possible candidates for 
future foremen’s jobs or management re- 
sponsibilities. Job titles included leadman, 
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ater, inspector, foreman, technician, etc. A 
ew were relatively recent employees but most 
1ad worked for the company 3 or 4 years and 
vere between 25 and 35 years old. All Ss were 
nale; most were high school graduates, but 
t few had less education and two or three 
ad college training. The follow-up was con- 
lucted about 33 years after testing, during 
vhich time the company had grown from 
00 employees to 2,000. This manufacturing 
irm is hereafter abbreviated Manuf. 

The other company was a carrier of work- 
nan’s compensation insurance though other 
ines of insurance have since been added. The 
s were similarly chosen but their job titles 
vere quite different: underwriter, actuary, 
uditor, district manager, claims supervisor, 
abulating manager, etc. They were all male 
igh school or college graduates; most had 
een employed for several years, and they 
panned an age range from 22 to 59, most 
eing under 35. The follow-up was conducted 
years after testing; the company had grown 
rom 255 to 385. This firm is abbreviated Ins. 

The test battery for all Ss included the 
trong Vocational Interest Blank (SVIB), the 
Ainnesota Multiphasic Personality Inventory 
MMPI), and the Otis Quick-Scoring Mental 
bility (Gamma) Test. In addition, the Manuf 
mployees took the numerical section of the 
reneral Clerical Test, the Bennett Mechan- 
‘al Comprehension Test (Form BB), the 
finnesota Paper Form Board, the Test of 
ractical Judgment, and the How Supervise? 
Form M). The Ins employees took, in ad- 
ition to the three basic tests, the Minnesota 
lerical Test and the Primary Business 
nterests Test. 

The test reports were all written by the 
ame psychologist and consisted of an average 
f about 500 words offering a general ap- 
raisal of each employee’s abilities and limita- 
ons with considerable emphasis on those in- 
rest, personality, and motivational factors 
hich might appropriately influence future 
lacement or promotion within the company. 
articular attention was paid to potentiality 
4 supervisory, administrative, and executive 
Ositions. 

Because the reports were in narrative form, 
mbodying clinical interpretations of the 
[MPI and SVIB, it was necessary to render 


them amenable to statistical treatment. The 
first method used was to devise a rating 
system. Seven “global” personality variables 
related to those used in the Office of Strategic 
Services assessment study (OSS Staff, 1948) 
were defined as follows: 

I. Effective Intelligence. Ability to under- 
stand and solve problems efficiently. Grasps 
problems clearly—perceives the issues. Learns 
readily. Emphasis on_ intellectual ability 
which can be applied in action. Versus: Slow- 
ness in learning new materials, routines. 
Cannot perceive the issue clearly. 

II. Personal Soundness. Maturity in per- 
sonal relationships both on the job and in 
family and personal life. Personal life satisfy- 
ing; is comfortable with himself. Free from 
emotional instability. Tolerates pressure with- 
out difficulties. Versus: Worry or stress causes 
symptoms like: trouble on the job, physical 
illness, frequent absence, wife trouble, drink- 
ing, and accidents. 

III. Drive and Ambition. Active and ener- 
getic in pursuing a course of action. Ready to 
assume initiative. Persistent in seeking goals; 
willing to extend extra effort to reach them. 
Is personally ambitious; demonstrates a desire 
to “get ahead.” Versus: Not energetic. Not 
persistent. Has little initiative or ambition. 

IV. Leadership and Dominance. Is poised 
and self-confident in dealing with others. Con- 
fident without arrogance. Maintains authority 
easily and comfortably. Has a matter-of-fact 
way of expecting his orders to be obeyed or 
his suggestions to be accepted. Is positive 
and decisive in making decisions and directing 
others. Versus: Lacks confidence; is ill at ease 
in position of authority. Cannot be decisive; 
is easily-confused or uncertain about decisions. 

V. Likeableness. Gets along well with other 
people. Is popular with his associates. Is tact- 
ful and adept in social contacts. Versus: 
Unlikeable, unpopular. Tactless, blunt. An- 
noying habits. Irritable; makes others un- 
comfortable. 

VI. Responsibility and Conscientiousness. 
Can give him a job and count on him to see 
it through properly. Thorough and prompt in 
carrying out assignments. Takes his duties 
seriously. His work is accurate. Versus: Needs 
to be checked or jogged frequently to get 
job done. Careless, sloppy work. 
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VII. Ability to Cooperate. Works effectively 
with other individuals and departments. Can 
discuss differences freely and effectively. Can 
compromise and admit own errors or those 
of own department without undue need to 
defend himself or his program. Is a good 
team worker. Tries to “get the job done” in 
the way which is best for the entire organiza- 
tion without regard for personal feelings. 
Versus: Defensive about own program if 
criticized or compromise is needed. Not a 
good team worker. 

A seven-step scale (poor—below average— 
probably below average—average—probably 
above average—above average—superior) was 
used in rating these variables. A summary 
global variable was defined separately for 
the two firms: 

VIII. Estimate of Potential Functioning 
Level. At what level of responsibility do you 
think this man will ultimately be capable of 
functioning in this or a similar organization? 
[Manufacturing Firm] 

1. He should ultimately be able to handle 
one of the half-dozen top management jobs. 

2. He should someday be able to handle 
responsibilities of wide scope just below the 
top management echelon. 

3. He will probably be capable of handling 
the job of a department head or supervisor 
which includes considerable independent re- 
sponsibility and judgment. 

4. He should make a fine foreman with 
supervision of as many as 25 to 50 production 
workers (or a smaller number of more highly 
trained employees). 

5. He will ultimately make a good leadman 
—handling perhaps 6 to 10 production workers 
or their equivalent. 

6. He is not suited to supervisory duties or 
independent responsibility. 


[Insurance Firm] 

1. He should be able to handle one of the 
top management jobs. 

2. He should some day be able to handle 
responsibilities of wide scope just below the 
top management echelon. 

3. He will probably be capable of handling 
the job of a department head or supervisor 
which includes considerable independent re- 
sponsibility and judgment. 
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4. He should make a fine assistant to a 
department head with responsibility for super- 
vising clerical or semiskilled personnel. 

5. He is not suited to supervisory duties 
or independent responsibility. 


PREDICTOR DATA 


Report Rating (RR). The S’s names and 
job titles were deleted from the test reports 
which were then read by four psychologists 
who independently rated the Ss on each of the 
eight variables described. Since the reports 
contained no test scores, the ratings are thus 
two interpretive steps removed from the 
original test data. The RR score of each S on 
each variable is the average of the ratings of 
the four judges. 

Adjective and Phrase Analysis. In a further 
attempt to objectify the reports, they were 
carefully scanned for recurring adjectives and 
short phrases descriptive of personality. Ad- 
jectives which appeared in at least four re- 
ports in a given sample were placed on a 
checklist for that sample. The Manuf list con- 
tained 19 adjectives (e.g., aggressive, ambi- 
tious, conscientious, depressed); the Ins list, 
28 (e.g., able, aggressive, ambitious, anxious). 
Minor grammatical differences were ignored © 
in tallying (e.g., unaggressive versus non- 
aggressive). The seven phrases which ap- 
peared most frequently in the test reports 
of the combined samples were placed in a 
second checklist. Similar phrases were arbi- 
trarily tallied together, and a single phrase 
used to represent the attribute on the check- 
list (e.g., needs approval from others; needs 
acceptance from others). All such determina- 
tions were made before any criterion data 
were obtained. The phrases adopted were: 
(a) Needs approval and acceptance from | 
others;- (6) Needs emotional support from 
superiors; (c) Is unusually effective in inter-— 
personal relationships; (d) Finds it difficult 
to handle criticism; (e) Is sometimes resent-— 
ful of authority; (f) Makes an unusually 
good impression; (g¢) May become physically _ 
ill or suffer physical symptoms if pressure is” 
too great. 

Test Scores. To establish validity of the 
psychometric data without the intervention 
of a psychologist, all test scores were analyzed 
as predictors in the Manuf sample, except that 
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only selected SVIB scores were used. In the 
Ins sample, the Otis, Minnesota Clerical, 
and selected SVIB, and MMPI scores were 
analyzed. 

Test Ratings (TR). For the Ins sample 
only, the eight variables described above were 
also rated directly from the test data by a 
osychologist with considerable experience in 
vocational appraisal. This step was included 
to evaluate predictions with the influence of 
the original report writer removed. The TR 
score of each S on each variable is this single 
‘ating. 

Matching Study. An attempt to match the 
riginal test reports with criterion personality 
sketches of the Ins sample provided by the 
yersonnel director will be described below. 


CRITERION DATA 


Field Ratings (FR). The eight variables 
lescribed above had been defined in non- 
echnical language so they could be used to 
btain criterion measures from raters in 
he two companies. Because of the well-known 
lifficulties in obtaining reliable ratings, all 
riterion ratings were made during interviews 
vith company raters by the authors. Each 
nterview began with assurance of confi- 
lentiality and a discussion of the definitions 
f the variables and of common rating errors. 
fhe rater was told that approximately equal 
tumbers of ratings should fall on either side of 
average,” and that few, but some, ratings 
hould fall at the extremes. Company em- 
loyees of the general type tested were 
pecified as the base line for average. 

After this orientation, the rater was asked 
o consider each S in turn, first expressing his 
mpressions of S informally, then assigning 
he scale positions on each trait. If a rating 
ppeared to the interviewer to be distinctly 
aconsistent with the rater’s informal im- 
ression or to be the result of a misunder- 
tanding of the rating system, there was 
urther discussion and clarification before the 
ating was recorded. An average of 15 or 20 
uinutes was required to rate each S. The FR 
core for each S on each variable is the 
verage of the scale positions assigned by 
dependent raters. 

Manuf Sample Ratings. Twenty-three of the 
1 Ss were employed at the time field ratings 


were obtained. Five of the terminators 
(median service since assessment, 11 months) 
had resigned and were considered eligible for 
rehiring. The other three (median service 
since assessment, 14 months) had been fired 
or had resigned under management disap- 
proval and were not considered eligible for 
rehiring, 

Thirteen raters were interviewed. In most 
cases the rater was higher in organizational 
rank than S, although a few ratings by S’s 
peers were used. Raters considered only those 
Ss they knew well. Four independent ratings 
were obtained for 14 of the Ss, three ratings 
for 15 of the Ss, and two ratings for each of 
the 2 remaining Ss. The Ss who had termi- 
nated were rated on performance up to the 
time of termination. 

Ins Sample Ratings. Sixteen of the 26 men 
were still employed at the insurance company 
at the time of follow-up. Three of the termi- 
nators had resigned on a voluntary basis 
(median service after assessment, 63 months) ; 
the other 7 were discharged or resigned in- 
voluntarily (median service, 23 months). 

All 26 Ss were rated by a retired personnel 
director who had served the company during 
most of the follow-up period. The current 
personnel director rated the 21 Ss he knew, 
and 14 ratings were obtained from a vice 
president who was also an S. There were three 
ratings for 10 of the Ss, two ratings for 15 Ss, 
and a single rating for 1 S. 

Adjective and Phrase Analysis. For the 
Manuf sample, each S was checked on the ad- 
jectives and phrases by the one rater who 
knew him best. The six raters used were 
instructed to check between 4 and 8 of the 
19 adjectives and any number of phrases for 
each S. The only variation in the Ins sample 
was the instruction to check between 5 and 
12 of the 28 adjectives on the list and the 
fact that some of the Ss were checked by two 
raters. The checklists required that the field 
rater adopt the linguistic frame of reference 
provided by the test report writer. If corre- 
spondences in attribution exceed correspond- 
ences expected on the basis of the frequency 
of an item in the reports and on the com- 
pleted checklists, validity of the test reports 
is supported. 
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TABLE 1 


MEANS, STANDARD DEVIATIONS, AND RELIABILITY 
ESTIMATES FOR ComposITE REPORT AND FIELD 
Ratincs, MANUF SAMPLE 











Report ratings® Field ratings> 








Variable M SD _ rr M SD _fkk 
I Intelligence 4.29 1.86 .98 4.58. 29774.58 
II Soundness 3.94 1.24 .92 433) SIS 
III Drive 4.50 1.19 .94 4.60 1.07 .71 
IV Leadership 4,04 (1.27 9.90 3.99" 81530 S06 
V Likeableness 4°37) EO SmeRoO 4:03 1.01 162 
VI Responsibility 4.42 97 .85 51S OS 
VII Cooperativeness 4.47 97 86 4.47 1.00 .53 
VIII Potential 4.43 “1,27 (193 410 1.17 .81 
Average 92 .70 
ak =4, 
bk = 34, 


Objective Criteria. Involuntary termina- 
tion, current or final salary, and change in 
job level were used as objective criteria. The 
latter was rated by the respective personnel 
directors. 

One obstacle to validation studies of indus- 
trial testing is that test reports may influ- 
ence the judgments used as criteria. Such 
contamination is probably minimal in this 
study, though it cannot be ruled out entirely. 
While two raters in each company had had 
direct access to the test reports, none had 
read them in recent years. In neither com- 
pany was psychological testing ever a sys- 
tematic and accepted management practice. 
Although the reports may have initially con- 
firmed or cast doubt upon opinions held about 
Ss, the long time span since testing and the 
intimate day-to-day contact of the raters and 
Ss argue against criteria reflecting much from 
the test reports not well confirmed by direct 
observation. 


RESULTS AND DiscussION, MANUFACTURING 
SAMPLE 


Ratings. The report readers achieved very 
high reliabilities in their ratings; the relia- 
bilities of the field ratings are lower but 
generally satisfactory (Table 1). The means 
and standard deviations in the table sug- 
gest that raters adhered well to instructions 
regarding the distribution of ratings. 

Ebel’s (1951, p. 412) analysis of variance 
technique was used to estimate the reliability 
of the composite ratings. Error due to inter- 
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rater variance (constant rater differences) is 
excluded in the RR estimates, since a com- 
posite from a fixed rater pool does not reflect 
such variance. Rater variance is treated as 
error in the FR estimates, determined for the 
case where the source of ratings is unidenti- 
fied. This latter estimate pertains to a 
composite where &, the average number of 
raters, = 3.4. Averaged correlations here and 
subsequently are via z transformation. 

Predictive Validity of RR Variables and 
Initial Salary. Table 2 shows the validity of 
the relevant * RR variable, irrelevant * varia- 
bles, and initial salary, and validities cor- 
rected for criterion unreliability are also 
shown. Correlations significant at the .05 level 
are italicized and those significant at the .10 
level are bracketed here and subsequently 
(two-tailed tests). 

All the report rating variables are posi- 
tively associated with their field rating 


counterparts, and all but one are significant — 


at least at the .10 level. The global predictor — 
(VIII) accounts for a little more than 25% ~ 


of the variance of the global field rating and 
for a third of the S’s final salary status, 
although it predicts job level increase only 
insignificantly. 


Column 2 provides some indication of the 


discriminant validity (Campbell & Fiske, 
1959) of the RR variables. The “irrelevant” 


* 


(more appropriately “less relevant”) pre-— 
dictors do tend to correlate with the criterion — 


but consistently less so than the relevant pre- 
dictors. RR VIII is not treated as irrelevant 
because of its explicitly global character, nor 
does the correlation of .38 between RRs I 
through VII and FR VIII seem inappropriate. 

The average intercorrelation of the RRs 
was found to be .48, and of the FRs, .57. 
The intercorrelation patterns in these mono- 
method matrices suggest the same two clusters 
in each: Potential, Intelligence, Leadership 


83RR VIII (Potential) is used as “relevant” in 
predicting. the objective criteria. This variable has 
been reflected here and subsequently so that high 
scores are favorable, as in the case of the other 
variables. 

4Since RR VIII is an “over-all” rating, it is not 
treated as irrelevant. Thus, for variables I through 
VII, the “irrelevant” variables are the remaining six, 
VIII excluded; for RR VIII, the remaining seven 
were called irrelevant. 
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TABLE 2 


VALIDITIES OF Composite Report Ratincs (RR) anp InitrAL SALARY, MANUF SAMPLE 








Predictors 
Corrected 
(2) for un- 
(1) Mean (3) reliability 
Relevant other Initial of criterion 
Criteria RR RRs salary (1.3) (1+3) (1) (1+3) 
Field rating 
I Intelligence 40 feole oo [R3o| 60 AGHA 78 
II Soundness sve 2h JIS .26 45 A2 Poll 
III Drive [.30] 16 [.34] 16 Bayh 36 44 
IV Leadership 41 oD ro 16 64 46 EE) 
V Likeableness [.30] 14 aA 26 31 38 [.40 ] 
VI Responsibility BOG 09 AO) 29 Reed 44 oul 
VII Cooperativeness .20 24 ro aD 35 [.28] 49 
VIII Potential eo, 38 69 aD, Rial AoA 79 
Average 36 23 41 23 50 43 60 
Ibjective 
Final salary 58 36 93 23 93 
Job level increase .20 3 42 —.04 49 





Note.—N = 31. 


average y = .76 in the RR matrix; .70 for 
he FRs) and Likeableness, Cooperativeness, 
ind Soundness (average r= .75; RR; .77 
‘R). A formal factor analysis does not seem 
ustified in view of the small number of 
rariables and Ss. 

Column 3 shows the predictive validity of 
alary at the time of testing. This variable 
s a significantly better predictor of the 
ollow-up criteria than are the report ratings 
n four cases and negligibly different in the 
ther six. The high correlation of initial and 
inal salary is partly an artifact produced by 
_ rapidly rising wage scale in the electronics 
ndustry; i.e., proportionately less salary 
noney is available for purely merit raises 
rhen increases must be broadly distributed 
0 retain technical personnel in short supply. 
‘he partial correlations in Column 4 suggest 
he test assessments may make a predictive 
ontribution independently of initial salary 
1 some instances. Replication is required, 
Owever, since at the sample size, only 
ne partial correlation is statistically signifi- 
ant. The multiple correlations shown in the 
fth column also suggest the same possibility, 
Ithough in no instance is R significantly 


greater than the validity of initial salary 
alone. 

It should be pointed out that the “initial” 
salary figure includes not only management’s 
original evaluation of the employee’s value 
when hired, but subsequent adjustments 
based on a period of on-the-job observation 
which ranged from 1 month to 64 years and 
average nearly 3 years. Only 4 of the 31 Ss 
had been employed less than 6 months at the 
time of testing. Therefore, partialling out the 
initial salary is not entirely justified and cer- 
tainly does not reflect upon the concurrent 
validity of the test assessments. A high degree 
of overlap between test and objective predic- 
tors may be cited as evidence of test validity. 
It is also possible, of course, that both initial 
salary and the follow-up criteria share non- 
merit “halo” variance of which test assess- 
ments may be relatively free. In such a situa- 
tion, equivalent or superior predictive power 
of the nontest predictor does not preclude 
greater construct validity of the test assess- 
ments (Cronbach & Meehl, 1955). Another 
possible basis for a spuriously high relation- 
ship of initial salary and the criteria is that 
the image of a new man created by the salary 
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level set at the time of hiring may have 
“colored” subsequent advancements and the 
criteria. The test reports did not affect initial 
salary since the latter had already been set 
at the time of the assessments. 

From the standpoint of management, of 
course, the high correlation between salary 
at the time of assessment and subsequent 
criteria of the employee’s worth raises the 
juestion of the practical contribution made 
oy test assessments of employees whom man- 
ugement has already had reasonable oppor- 
unity to observe. The present study does 
1ot bear on the possible advantage of testing 
it the point of employment, since actual 
tarting salary was not used as a predictor. 
Nor can we say whether test assessments 
vould make a practical contribution in other, 
nitially status-homogeneous samples. 

Columns 6 and 7 report validities cor- 
ected for criterion unreliability. When pre- 
lictability of Ss’ actual characteristics or 
nerit, as opposed to characteristics as esti- 
nated by a fallible criterion, is of interest, 
djustments for criterion unreliability are 
ippropriate. This adjustment may help deter- 
nine the practical value of test assessments, 
ince management is ultimately more inter- 
sted in actual merit than in fallible judg- 
nents of it, even if the former is not im- 
nediately quantifiable. The corrected validi- 
ies are almost all significant and in some 
nstances quite substantial. The test rating 
nd initial salary combined would account 
or almost two-thirds of the variance of a 
erfectly reliable measure of the global 
ollow-up variable (VIII). 

Validity of Test Scores. Since the report 
atings necessarily drew heavily on the 
linical interpretations of the MMPI in the 
eports, it is interesting to note that indi- 
idual MMPI scores do not perform well 
s predictors of any of the 10 criteria. Only 
5 of 140 correlations are significant at the 
LO level. Used in a “naive actuarial” 
ashion, then, MMPI scores make no con- 
‘ribution, although ratings based on a cli- 
ician’s interpretation of them do. 

The validities of other individual test scores 
yr the Manuf sample are given in Table 3. 
Tost of the scores were not expected to cor- 
‘late with the FR variables, except for Intel- 


TABLE 4 


DIFFERENCES BETWEEN Composite REPORT RATING 
MEans or Totat Group (NV = 31) anp INvotun- 
TARY TERMINATOR Group, (WV = 3), 
MaAnur SAMPLE 











Report rating D CR p* 

I Intelligence 1 id Boil 

II Soundness 1.01 1.83 .07 
III Drive 83 1.28 20 
IV Leadership 78 1.13 26 
V Likeableness 54 1.06 .29 
VI Responsibility 1,00 1.88 .06 
VIL Cooperativeness 88 1.67 .10 
VIII Potential fel 1.66 .10 


4 Two-tailed. 


ligence and Potential, and this expectation is 
confirmed. Five scores do correlate with final 
salary, and the Otis and General Clerical 
tests predict several personality variables at 
the .05 level. High SVIB scores in the tech- 
nical areas appear to be associated with low 
managerial success as reflected in the ratings. 
Except for the Otis’s prediction of Intelligence 
ratings, no psychometric score achieves as 
high validity as the report ratings. 

Validity of Adjectives and Phrases. Each 
checklist item was tallied separately for its 
presence and absence in the test reports and 
on the checklists completed by company 
raters. There were thus as many 2 X 2 tables 
as items, with NV for each table being the 
number of Ss. Expected frequencies were 
determined for each table by the usual chi- 
square procedure. Since these were typically 
too small for a chi-square analysis, a count 
was made of the number of items for which 
the observed present-present frequency ex- 
ceeded the expected. For 15 of 19 adjectives 
this was the case (p= .02, binomial test, 
two tails, Siegel, 1956, p. 69). Four of the 
six phrases showed this excess (p = .34), and 
19 of 25 items of both types (p = .01). This 
finding for the adjectives and phrases pro- 
vides further evidence of the validity of 
the MMPI interpretations from which the 
adjectives were primarily drawn. 

Discharged Ss. Table 4 compares the re- 
port ratings of the three involuntary termi- 
nated cases with those of the total sample. 
Critical ratios are based on subsample-total 
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TABLE 5 


MEANS AND STANDARD DEVIATIONS OF RATINGS, AND RELIABILITY ESTIMATES FOR COMPOSITE 
ReEporT AND Frerp Ratincs, INS SAMPLE 























Report ratings® Field ratings» Test ratings® 
Variable M SD Tk M WD) kk M SD 
T Intelligence 4.64 1.66 99 4.51 1.54 97 4.15 1.70 
II Soundness 4.03 1.43 93 3.60 1.64 87 3.96 1.48 
III Drive 4.59 81 .68 4.09 1.37 .88 4.15 135 
IV Leadership 4.06 97 .83 3.64 1253 .80 3.85 1.38 
V Likeableness 4.46 1.14 88 4.63 1.28 .64 4.15 1.10 
VI Responsibility 4.51 1.03 84 4,43 1.44 81 4.27 1.16 
VII Cooperativeness 4,28 1.19 .86 4.63 129 .64 4.31 1.23 
VIII Potential 2.97 .84 87 3.19 1.29 .90 3.73 94 
Average 89 85 
Note.—N = 26. 
ak =4, 
bk =2.4. 
ck =1. 


sample comparison (McNemar, 1962, p. 94). potential in predicting gross managerial 
The f values for Soundness, Responsibility, failure. 

Cooperativeness, and Potential are remark- 
ably low considering the extremely small 
subsample size. All the differences are in the 
expected direction and almost all are fairly Ratings. Table 5 shows means, sigmas, and 
large relative to the variability of the ratings reliability estimates for the composite RR and 
(Table 1). The findings suggest test assess- FR ratings and means and sigmas of the test 
ments of this type may have considerable ratings (TR). Reliabilities were estimated as © 


RESULTS AND DiscuSsSION, INSURANCE 
SAMPLE 


ee 





TABLE 6 
Vatipities OF REport Ratincs (RR), Test Ratincs (TR) anp Initrat Satary, INs SAMPLE 














Predictors 
1 2 3 4 
Mean Corrected for un- 
Relevant other Tnitial _reliability of 
Criteria RR RRs TR salary criterion 
Field ratings (1) (3) 
I Intelligence 05 .06 63 500; 66 64 
II Soundness 29 eu .16 00 ol Ad 
TII Drive alts —.10 43 50 18 46 
IV Leadership 19 07 29 09 21 [dda] 
V Likeableness 40 .16 [.36] [—.38] 0 45 
VI Responsibility .03 —.04 .19 .10 04 24 
VII Cooperativeness 21 .20 40 — 29 .26 50 
VIII Potential 2o 19 18 05 39 2 
Average 29 .10 [.34] 01 fiS3i) 39 
Objective 
Final salary 28 18 at) 50 
Supervisory increase .10 .06 .09 24 





Note.—N = 26. 
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TABLE 8 


DIFFERENCES BETWEEN Composite REPORT RATING 
MEANS OF ToTAL Group (NV = 26) AND INVOLUN- 
TARY TERMINATOR Group (NV = 7), 

Ins SAMPLE 








Report rating D CR p* 

I Intelligence 83 1.54 a2 

II Soundness .99 DAS 03 
III Drive 30 1.16 5 
IV Leadership 24 aD) AS 
V Likeableness 28 BA 44 
VI Responsibility AT 1.42 aS) 
VII Cooperativeness oY) 82 40 
VIII Potential .60 DDN 03 





8 Two-tailed. 


before. The field ratings are generally more 
reliable than in the Manuf sample, although 
based on a smaller average number of raters. 
The Ins raters were, generally speaking, 
better trained and experienced in personnel 
work and had longer and probably more 
intensive contact with Ss, factors which may 
account for the superiority. 

Predictive Validity of RR, TR, and Initial 
Status Variables. Table 6 shows validities of 
the same variables as in Table 2, plus the 
validities of the TR. Multiple and partial 
correlations are not shown because of lack 
of association of initial salary and the criteria. 

The average validities of the RR variables 
are lower than in the Manuf sample, and since 
N is also smaller, fewer correlations are sig- 
nificant. Intelligence is more predictable than 
in the Manuf sample, and this is only partly 
due to a more reliable criterion. Likeableness, 
one of the least predictable criteria in the 
Manuf sample, is the only other variable which 
is significantly predicted in the Ins sample. 
The psychologist who made predictions di- 
rectly from the test data achieved slightly 
higher validities than the RRs, especially on 
Drive and Cooperativeness (Column 3). 
Neither the global criterion rating nor the ob- 
jective criteria can be said to be satisfactorily 
predictable in this sample. 

Insofar as any convergent validity is pres- 
ent, it appears also discriminant: irrelevant 
RR variables are uniformly negligibly pre- 
dictive. Similarly, if the correlations in Col- 
umns 1 and 3 indicate any validity for the 
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test assessments, this appears a contribution 
by the tests which is independent of any 
association with initial status; initial salary 
fails as a predictor of 8 of 10 criteria in 
this sample. 

Correcting for unreliability of the criteria 
improves the pattern of validity somewhat: 
the average validity coefficient for predicting 
the eight criteria becomes significant at the 
.10 level. 

Validity of Test Scores. Table 7 shows 
the validities of all test scores except the 
MMPI. Except for the validity of Otis and 
clerical scores for the intelligence rating, 
there is little evidence of validity of indi- 
vidual ability test scores for any of the cri- 
teria. The SVIB scores yield a few interesting 
correlations and are negatively associated with 
the criteria about two-thirds of the time. 
Exactly 10% of the correlations of the six 
MMPI scores and the criteria are significant 
at the 10% level. 

Adjectives and Phrases. Comparison of 
expected and observed correspondences was 
made in the Manuf sample. When £ was less 
than 1.0, values were rounded to zero or 
unity. One rater checked the 21 Ss he knew. 
A second rater checked all 26 Ss. The hy- 
pothesis of greater than stereotype corre- 
spondence of descriptions by report writer 
and field rater was unsubstantiated by the 
data from Rater 1 and only suggestively sub- 
stantiated by the data for Rater 2. A sign 
test of positive and negative instances for all 
items and both raters (32/51) gives a two- 
tailed probability of .16. 

Discharged Ss. Table 8 compares the RR 
ratings of the seven involuntary terminators 
with the total sample. All the differences are 
in the expected direction, and the differences 
for Soundness and Potential are significant. 
These two variables correlate .42 and .43 with 
the terminator-nonterminator criterion (point 


biserial 7). Since 7pp is conservative, this is a — 


somewhat more favorable validity than that 
found for these variables in Table 6. 
Matching Study. Before furnishing his 
ratings, the Ins personnel director wrote per- 
sonality sketches for 24 of the Ss. The 
longest were of paragraph length; some were 
only two or three sentences. These criterion 
sketches and the test reports were deployed 


. 
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in a modification of Cronbach’s (1948) 
matching design. The Ss were randomly as- 
signed to eight triads. Three sets of eight 
folders were prepared, each folder contain- 
ing the sketches from three Ss and one test 
report from that triad. Each judge received 
a complete set of folders with instructions to 
match the test report in each folder with one 
of the sketches in that folder. Each of the 
three sets of folders contained the same 
series of eight triads of sketches, but the test 
reports included for matching were different 
in each set. Table 9 shows the schema for the 
first two triads. The design precludes match- 
ing by elimination, since no datum is seen 
twice by any judge. Chance matching accu- 
racy for each judge is 2.67 (4 for each folder, 
8 folders). 

Twelve judges were used; all were psy- 
chologists, 11 were PhDs, and 8 were experi- 
enced in clinical or industrial psychology. 
Nine judges matched from the full test reports 
and three judges used only the personality 
section of the reports. Table 10 shows the 
number of correct matches by each judge. Ten 
of 12 judges matched better than chance 
(p < .01, sign test). The mean number of 
correct matches is 3.9. All the experienced 
judges exceeded chance (M = 4.5). The num- 
ber of judges does not permit determination 
of differences in difficulty of the sets, but the 
brevity of some of the sketches appeared to 
reduce the discriminability of some of the 
triads. The findings indicate a moderate de- 
gree of correspondence between the assess- 
ments reflected in the test reports and the 
personality descriptions by the personnel 
director. 


TABLE 9 


PARTIAL SCHEMA FOR MATCHING STUDY 








Set.1 Set 2 Set 3 





Report Sketch Report Sketch Report Sketch 





Triad I A 


Triad II D 


Zoey awp> 
JZHBo0 owe 
JIoHooy aowp 


Note.—Letters represent different Ss. 


TABLE 10 


NuMBER OF CorRECT MATCHES OF PERSONALITY 
DESCRIPTIONS BY Eacu oF 12 JUDGES, 
Ins SAMPLE 








Set 1 Set 2 Set 3 
62 2 53 
53 3 4 
2 32 of 
3b dab 5ab 





2 Experienced judge. i 
b These judges matched personality section only and did not 
see the complete report. 


COMPARISON OF THE SAMPLES 


The relatively small Ws make comparisons 
somewhat tenuous, but several factors are 
plausible as explanations of the greater ob- 
tained validities in the Manuf sample. The 
time span over which predictions were made 
was twice as great in the Ins sample. Studies 
of intelligence (Bayley, 1949) and personality 
development (Kagan & Moss, 1962) indicate 
greater success in short- over long-term pre- 
diction. Personality traits may have been 
more variable in the Manuf sample, composed 
of Ss with more diversified roles, and thus 
more discriminable to the clinical analyst. 
The standard deviations of the test variables 
common to both samples (Table 11) confirm 
this in the case of ability and personality 
measures. Finally, there is some evidence that 
nepotism, cronyism, and other idiosyncratic 
factors played a relatively greater role in 
personnel decisions in the insurance company, 
both for historical reasons and because of its 
smaller size. 


RELATED STUDIES 


Comparison of these findings and those of 
similar studies is necessarily crude because of 
differences in Ss and data types. Hilton, 
Bolin, Parker, Taylor, & Walker (1955) re- 
port an average validity of .28 for psy- 
chologists’ predictive ratings of interview and 
test data, presumably of the paper and pencil 
variety. Supervisors’ ratings of five person- 
ality traits of higher level personnel and job 
classifications at differing follow-up points 
were the criteria. The average validity in- 
creased to .35 when corrected for criterion 
unreliability, compared with corrected average 
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BDA oe eel 


STANDARD DEVIATIONS OF TEST SCORES 
ComMON TO SAMPLES MANUF AND INS 











Manu- 
factur- _ Insur- 
Score ing ance 
Otis IQ 12.74 9.65 
K 2123 Soil 
Hs 3.29 1.84** 
Pd 3.92 3.36 
Bt 4.31 3.26 
Ma 3.65 2.96 
Si 8.27 4,58** 
Production manager 8.44 8.62 
President, manufacturing 7.83 8.54 
OL 7.84 5.04* 
*pr <.05. 
* br <.001. 


validities of .43 and .33 in the present anal- 
ysis. More predictive data and greater S 
heterogeneity tend to favor higher validities 
in the Hilton study; less systematic collection 
of criterion ratings and greater heterogeneity 
of raters and circumstances tend to limit the 
Hilton validities. 

Huse (1962) found an average validity of 
.28 for predictions of single criterion ratings 
of eight traits by a single psychologist’s rating 
of objective test data. The criteria were 
heterogeneous as in the Hilton et al. (1955) 
study. Adding projective and interview data 
did not increase validity. The reliability of 
the predictor and criterion is not estimable, 
but almost certainly poorer than the com- 
posite ratings used in the present study and 
by Hilton et al. 

Kelly and Fiske (1951, pp. 168-169) pre- 
dicted clinical competence ratings of psy- 
chology trainees at a 3-year follow-up with a 
validity of .37 by a single psychologist’s rating 
of credential and objective test data. This com- 
pares with predictions of global Field Rating 
VIII in the present study with validities .31 
(Ins, RR), .18 (Ins, TR), and .51 (Manuf, 
RR). Again, reliabilities cannot be compared. 

The Kelley-Fiske (1951) predictions were 
based on somewhat more information than 
those of the present study. Probably more 
importantly, they involved an_ intimate 
familiarity with the criterion on the part of 
the predictive rater, a clinical psychologist. 


Reported validities, in excess of those found 
in the industrial studies cited, typically in- 
volve both these factors. Handyside and 
Duncan (1954) obtained validity of .65 for 
prediction of supervisory performance criteria 
at a 44 year follow-up. Ability tests, recom- 
mendations, interviews, and situation tests 
were predictively rated by a panel which in- 
cluded psychologists and foremen from the 
setting in which the criteria were later de- 
termined. Holt (1958) trained two judges 
intensively in the nature of a criterion of 
psychiatric competence. Validity of ratings 
of the judges from very extensive predictor 
data exceeded .50 in three of four instances. 
Stern, Stein, and Bloom (1956) obtained 
extremely high validities in very small 
samples following a major study of the at- 
titudes and values of supervisory personnel 
who were to make the criterion ratings. 


LIMITATIONS AND IMPLICATIONS 


Preselection of the samples by management 
tends to reduce predictability, although 
validity of assessments for placement or 
management of current employees necessarily 
involves a preselected sample. Heterogeneity 
of the job positions and roles is a more 
serious limitation in studies of this type. The 
present study ameliorates the problem some- 
what in that the samples came from single 
firms. However, imposing a common rating 
framework on individuals with functions as 
diverse as subforeman and factory manager, 
sales supervisor, and auditor does not result 
in an ideal criterion. As pointed out, hetero- 
geneity of the sample may also in some in- 
stances tend to increase predictable variance. 

Lack of a broad basis of predictively as- 
sessing personality traits is a serious limita- 
tion of the present and similar studies. Espe- 
cially in the case of criteria involving complex 
interpersonal skills and liabilities, measures 
of ability and interest cannot be expected to 
make fine discriminations. Personality pre- 
dictions were based entirely on the MMPI, 
an instrument not designed for discriminating 
among normal or superior personalities, al- 
though success has been reported in this 
application (Dahlstrom & Welsh, 1960). 
There are probably no instruments demon- 
strably superior for the purpose; adding more 
or “richer” personality tests to an assessment 
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program is no guarantee of increased validity 
(Horowitz, 1962; Huse, 1962; Kelly & Fiske, 
1951; Sines, 1959). It seems likely, however, 
that there is room for improvement in de- 
veloping personality measures designed by 
detailed study of the criteria at hand. Unless 
this kind of improvement can be achieved, 
the range .40 to .50 may be the upper limit 
for validity. 

Holt’s (1958) conclusion that well-oriented 
clinicians can better actuarial predictors is 
confirmed in the present data. It must be 
pointed out, however, that the linear regres- 
sion method used here (and by Holt) is 
“naive,” in contrast to the more “sophisti- 
cated” clinical procedures. More elaborate, 
pattern-analytic schemes of weighting and 
combining data (Meehl & Dahlstrom, 1960) 
might vindicate the actuary. 

Industrial consulting firms using the clinical 
method in personnel assessment might profit- 
ably explore the possibility that different 
psychologists using the same test data might 
be more successful rating some personality 
factors than others. A study of Table 6 re- 
veals, for example, that the test rater esti- 
mated Drive successfully while the ratings 
based on the original report did not, whereas 
the reverse was true for Potential. 

The two clusters suggested by the inter- 
correlations of both report and field ratings 
of the eight variables merit further investiga- 
tion. They seem close to two of the three 
factors reported by Bass (1962) in his 
assessment of industrial personnel: job-orien- 
tation and interaction-orientation. 

The very satisfactory reliabilities obtained 
in rating the personality variables from psy- 
chological test reports in this study suggest 
that this problem need not be a serious 
obstacle to studies of predictive validity of 
psychological assessments which were not 
originally prepared for statistical treatment. 
Adjective and phrase counts, matching stud- 
ies, and Q-sort methods also offer promising 
methods of overcoming this initial obstacle 
to follow-up evaluations. 

The satisfactory reliabilities obtained on 
the criterion ratings would seem to constitute 
an endorsement of the careful selection and 
definition of variables to be rated in terms 
that are meaningful to the industrial raters 


and the value of the interview method for 
obtaining ratings from nonprofessional raters. 
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This study investigated the rating attitudes of supervisors and subordinates 
and private-performance evaluations. 
84-women nursing administrators were assigned roles as supervisors and sub- 
ordinates and given instructions for subordinate-evaluation interviews. 24 inter- 
views were conducted, 6 in private and 18 in public, with observers randomly 
selected from the nursing administrators. The results clearly indicated that 
supervisors were more negative in their initial subordinate-appraisal ratings 
than subordinates. In addition, observers were more negative in their ratings 
than participants. Finally, when evaluative interviews were conducted publicly, 
the subordinates experienced a number of negative reactions that were not 
evident in private interviews. The concepts of psychological distance and 
role stereotypes were discussed in explaining these results. Additional research 
should determine whether generalization is possible from these role-playing 
interactions to work situation dynamics. 


and their reactions during public- 


The present research is one of a series of 
investigations (Hanson, Morton, & Rothaus, 
1963; Morton, Rothaus, & Hanson, 1961; 
Rothaus, Hanson, & Oglesby, 1962) exploring 
the relationships and reactions between super- 
visors and subordinates emerging from two 
distinct methods of performance appraisal. 
The two performance-appraisal techniques 
under discussion are the traditional fraits 
rating approach and the newly formulated 
goals approach (Blake & Mouton, 1961; 
Kelley, 1958; McGregor, 1960). Under the 
traits approach the supervisor assumes the 
responsibility of making judgments about the 
assets, liabilities, and personality traits of 
the subordinate, and decides how the sub- 
ordinate should change. In the goals ap- 
proach, on the other hand, both supervisor and 
subordinate share the responsibility for 
creating performance goals and planning con- 
crete actions for accomplishing these goals. 
Through the use of role-playing techniques 
(Hanson et al., 1963; Morton et al., 1961; 
Rothaus et al., 1962), it was demonstrated 
that under a traits approach supervisors 
tended to be more negative in rating a sub- 

1Now with Management Development, Liquid 


Rocket Plant, Aerojet-General Corporation, Sacra- 
mento, California. 


48 


ordinate’s performance than the subordinates’ 
rating of themselves. When discussion of the 
supervisor’s rating ensued, the subordinates 
sensed the supervisor’s critical attitude and 
reacted with feelings of dissatisfaction and 
conflict, as well as with resentment and 
defensiveness. 

Hanson et al. (1963) hypothesized that 
the initial discrepancies in the performance 
ratings of supervisors were “due to stereo- 
typed attitudes inherent in the role of super- 
visor and subordinate.” One might argue, 
however, that the disparity in the ratings 
did not depend solely on role stereotypes, but 
also on the expectations of the supervisors 
and subordinates of mutual confrontation 
and defense of their respective ratings. That 
is, supervisors and subordinates may develop 
disparate rating attitudes only when they 
expect to discuss their ratings. Without the 
expectation of discussion, however, a dispar- 
ity in rating attitudes may not evolve. In 
other words, if a supervisor and subordinate 
assess the subordinate’s performance, each to 
himself, expecting the assessment to go no 
further, their assessments may be quite simi- 
lar. Yet, as soon as they expect to discuss 
each other’s ratings, their assessments may 
change. Both may “get ready for fight” and 
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come “defensive” in the ratings. The ex- 
sctancy of future interaction causes them 
. develop disparate rating attitudes. 
To control for the expectancy of future 
teraction, the current study included non- 
urticipating observers who were assigned 
les as supervisors and subordinates and 
so asked to rate the quality of the sub- 
dinate’s performance. Prediction was made 
at there would be a disparity in the initial 
formance ratings of these observing super- 
sors and observing subordinates, even 
ough they had no expectancy for mutual 
teraction. In other words, prediction was 
ade that contrary biases would exist in 
pervisors and subordinates as a result of 
lopting evaluative attitudes, regardless of 
hether or not they anticipated later dis- 
ission and interaction. 
A second research prediction was also made 
at peers of subordinates and peers of super- 
sors would be more negatively critical than 
eir counterparts in their initial traits 
tings. This hypothesis emerged from the 
udies of leadership and effectiveness of work 
oups by Fiedler (1955, 1960) and Cleven 
d Fiedler (1956). These studies suggest 
at the psychologically more distant super- 
sor is more effective in raising the produc- 
rity of task groups, but tends to be more 
tical and rejecting in his work relationships 
th subordinates, particularly those he con- 
lers poor workers. Psychological distance 
is measured as a personality trait of super- 
sors and related to other trait behavior 
ch as criticalness of subordinates. Spe- 
ically, psychological distance was measured 
_ having supervisors rate both good and 
or workers for similarity to themselves on 
number of traits. The supervisors who 
rceived the greatest differences in traits 
tween themselves and the less effective 
iployees (greatest psychological distance) 
te also most critical in appraising and 
rrecting work performance. 
In the present study an attempt was made 
create psychological distance experimen- 
ly, and note whether more critical rating 
itudes develop. Psychological distance was 
aerated through role assignment. For ex- 
ple, it was assumed that subjects (Ss) 


assigned roles as peers of a subordinate would 
feel more remote in viewing the subordinates’ 
work than the subordinate himself. Similarly, 
it was expected that Ss assigned roles as peers 
to the subordinates’ supervisor would feel 
more remote from the subordinate than would 
the subordinates’ own supervisor. Accord- 
ingly, prediction was made that evaluative 
ratings of the peers of supervisors and sub- 
ordinates would be more negatively critical 
than the ratings of supervisors and subordi- 
nates themselves. In other words, it was pre- 
dicted that the psychological distance gen- 
erated by peer role assignment would produce 
more critical appraisal attitudes. 

Finally, the current study explores the 
dynamics of supervisor-subordinate  inter- 
action in a traits rating performance-appraisal 
interview under observed and nonobserved 
conditions. Previous research (Blake & 
Mouton, 1961) has indicated that subordi- 
nates are distressed by a traits rating inter- 
action. Prediction is made that when a traits 
rating interaction is conducted publicly the 
stress on a subordinate and his negative 
reactions to his supervisor will increase 
markedly. 

In summary, the present research attempts 
to answer three questions: (a) Will super- 
visors and subordinates observing but not 
actively involved in performance-appraisal 
interviews show a similar discrepancy in their 
initial appraisal ratings of a hypothetical sub- 
ordinate as do supervisors and subordinates 
who expect to discuss their ratings. (b) Will 
observing supervisors and subordinates rate 
the performance description of a subordinate 
more negatively as a consequence of greater 
psychological distance from the subordinate. 
(c) Will a subordinate who is publicly evalu- 
ated react more negatively than one who is 
appraised in private? 


METHOD 


Subjects 


Eighty-four-women nursing administrators were 
Ss in the present research. The nurses were attend- 
ing a training workshop at the Institute of Nursing 
Service Administration held in the Rice Hotel, 
Houston, Texas, in February 1961. 
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Role Assignments 


The group was divided into 36 observers and 48 
participants.2 Participants and observers were then 
subdivided equally into roles as supervisors and 
subordinates. Participants were told they would 
enact performance-appraisal interviews. Observers 
were told they would observe the participants as 
they interacted. When an observer was assigned the 
role as supervisor he was told to imagine he was 
the co-worker and peer of the supervisor conducting 
the performance appraisal. Observers given roles of 
subordinates were asked to imagine they were co- 
workers and peers of the subordinate being 
appraised. 


Traits Rating 


Both observer and participant supervisors and 
subordinates studied two descriptions of the per- 
formance of a hypothetical subordinate, “Fred 
Winters.” 2 One description was written in terms 
the subordinate might use to evaluate himself. The 
second description of the subordinate’s performance 
was formulated from the perspective of a hypotheti- 
cal supervisor, “Bob Hayes.” After supervisors and 
subordinates read both descriptions, they filled out 
a Traits Rating Scale “objectively” evaluating the 
performance of the subordinate, Fred Winters. The 
Traits Rating Scale consists of 10 personal and 
work areas in which the subordinate was rated on a 
10-point scale ranging from Excellent to Poor. The 
scaled variables were: knowledge, quantity of work, 
quality of work, punctuality, adaptability, coopera- 
tion, ingenuity, judgment, economy, and supervision. 


Traits Rating Interviews 


Each participant supervisor met with an assigned 
subordinate to discuss her traits rating as Fred 
Winters, the subordinate. The supervisor’s instruc- 
tions for the interview were as follows: “During 
your interview with your subordinate go over the 
ratings you have made. Discuss whatever problems 
that arise as you would in any performance rating 
interview.” 

Eighteen of the 24 traits rating interviews were 
observed by two spectators, one in the role of peer 
supervisor and the other in the role of peer subordi- 
nate. The remaining 6 interviews were unobserved. 


Interview Behavior Ratings 


After the appraisal interview, which lasted about 
20 minutes, each participant, working alone, reported 
her reactions on a 10-item Behavior Rating Scale 
(Morton et al., 1961). Each scale consisted of nine 
intervals. For example, item one asks, “How much 
satisfaction did you derive from your participation?” 
The reactions on this item ranged from (1) “Com- 


2This case was developed by Douglas McGregor. 
The traits and goals methods, as well as the rating 
scales, were developed by Robert R. Blake and 
Jane Mouton, University of Texas. 

3 Data from 36 of these subjects are reported in 
Hanson et al. (1963). 


pletely dissatisfying experience” to (9) “Completely 
satisfying experience.” The subordinates, in addition, 
rated their willingness to change their work behavior 
and the clarity of their perception of how change 
was to be accomplished. 

Responses on the Traits Rating Scales, the initial 
evaluations of the hypothetical subordinate, were 
converted to a Traits Rating Score (TRS), the 
average scale rating made by an S of Fred Winters, 
and compared in a (2X2) analysis of variance 
(Supervisor-Subordinate X Participant-Observer) . 

The reactions of supervisors and subordinates to 
observed and unobserved appraisal interviews were 
analyzed by comparing their scores, item by item, 
on the Behavior Rating Scales (BRS). The’ com- 
parisons were made by the use of ¢ tests. 


RESULTS 
Performance Appraisal Ratings 


There were marked differences in the 
initial evaluative ratings made of the sub- 
ordinate as a function of who made the 
ratings. As indicated by Table 1, both the 
supervisor-subordinate variables and _ par- 
ticipant-observer variables were highly sig- 
nificant (p< .001 and p< .025, respec- 
tively). The most favorable ratings (mean 
TRS = 7.83) were made by the subordinates 
appraising their own performance. The ob- 
serving (peer) subordinates were less favor- 
able, with a mean TRS of 7.26. The par- 
ticipating supervisors were still less favorable 
(mean TRS = 6.70), and the least favorable 
ratings were made by observing supervisors 
(mean TRS = 6.31). ‘ 

In view of the fact that observing super- 
visors and subordinates demonstrate a dis- 
parity in their ratings of the subordinate’s 
performance similar to that of the partici- 
pants, it seems reasonable to conclude that 
the disparity in ratings by supervisor and 


TABLE 1 


ANALYSIS OF VARIANCE OF INITIAL TRAITS RATING 
ScoRES OF THE HYPOTHETICAL SUBORDINATE BY 
OBSERVING AND PARTICIPATING SUPERVISORS AND 


ae ee 











SUBORDINATES 
MS df Fa 
Role (Supervisors vs. subordinates) 1.0816 1 27.30 
Style (Observers vs. participants) .2304 1 5.82* 
Style X Role .0081 i .20 
Error -0396 80 
Total 83 





*p <.025. 
#K b < 001. 
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TABLE 2 


REACTIONS OF SUPERVISORS AND SUBORDINATES DURING PUBLIC- AND PRIVATE- 
PERFORMANCE-APPRAISAL INTERVIEWS 





Subordinates’ interview 


Supervisors’ interview 














Public* Private? Public# Private> 
Behavior rating 
scale reaction M SD M SD t test M SD M SD t test 
Satisfaction SOU) aye 6.83 2.40 Don O:35mn 88 6s 2.54 70 
Agreement S| Lee OS OO /muen 2225 .96 301, 1.82 6.83 1.47 1.48 
Leadership 3.61 1.69 417 2.40 63 O20 iit O67 at St 18 
Adequacy of partner 6.33 1.91 6.50 1.52 .20 7.44 1,20 8.50 84 2.00 
Hostilitye O22 2239 es iar 1.21 IRE INS Teh Pinel. 8S 05 
Resistance® 5.61 1.38 Me l7; 2 2.26* 6.39 1.85 eli 1.83 90 
Tension® 5 04m 17.0 T0703 2.18* 6.00 1.81 CoML 40 
Responsibility Som eee eld AL SSE 6:56 1.62 [e535 1.03 
Teamness 4.61 2.12 Soom nal 1.14 Goat 74 6.83 1.47 70 
Comfort 5.440 2-77 jesouen 133 2.02 AUVs 9st 5502.07 1e5 4 
aN = 18. 
bN =6. 


° The higher the score the more positive the reaction on these three scales, i.e., less hostility, resistance, and tension. 
= Pr 05. 
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ibordinate participants is not due to their 
‘pectation of interview interaction. Rather, 
le assignment as supervisor and subordinate 
ems to imply and usher in stereotyped and 
mnflicting rating attitudes. 

As predicted, the initial ratings of observers 
ere more negative than those of participants. 
he concept of psychological distance seems 
) embrace these findings quite well; the 
irther removed one becomes through role as- 
gnment from the subordinate who is being 
ypraised, the more negative the appraisal 
comes. 


erformance-A ppraisal Interviews 


The results for supervisors and subordi- 
ites who participated in observed (public) 
id unobserved (private) performance-ap- 
aisal interviews are compared in Table 2. 
or subordinates, there were sharp differences 
their feelings and reactions in public- and 
ivate-appraisal interviews. When their traits 
tings were discussed publicly, subordinates 
ported less satisfaction in the interview and 
eater resistance toward their supervisors. 
ibordinates also felt greater tension and 
uch less responsibility for the interaction 
uich took place in the interview. In ad- 
tion, they reported greater feelings of dis- 
mfort in the public interview, although the 
fference between the public and private 


interview did not quite reach significance at 
the 5% level of confidence. The results for 
the supervisors are in sharp contrast. On none 
of the 10 scales did comparisons reveal any 
significant differences in supervisors’ reactions 
in the public and private interviews. The 
only mean difference which approached sig- 
nificance was the evaluative rating which 
the supervisors made of the subordinates, ie., 
supervisors who conducted public-appraisal 
ratings tended to perceive their subordinates 
as somewhat less adequate than those who 
conducted their interviews in private. It is 
possible that the higher tension, lack of 
satisfaction, and lack of initiative, which 
subordinates felt when appraised publicly 
contributed to the ratings of less adequacy 
they received from their supervisors. There 
was no evidence, however, that the super- 
visors conducting public appraisals of sub- 
ordinates felt any of the tension, discomfort, 
or dissatisfaction demonstrated by the sub- 
ordinates. The emotional and disturbing 
effects of an audience upon a traits interaction 
seem limited to subordinates themselves. 


DISCUSSION 


As mentioned earlier, research studies have 
suggested that the roles of supervisor and 
subordinate involve stereotyped appraisal at- 
titudes, supervisors showing the more critical 
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and negative appraisal attitude than sub- 
ordinates (when the performance of the 
subordinate is being assessed). This disparity 
in the rating attitudes of supervisors and 
subordinates appears to figure importantly 
in the outcome of a traits interaction; the 
greater the disparity between the appraisal 
ratings of supervisor and subordinate, the 
more negative the outcome of the traits- 
performance-appraisal interaction. By con- 
trast, the goals approach has been shown to 
be effective in circumventing and attenuating 
the conflict inherent in the disparate ap- 
praisal ratings of supervisors and_ subordi- 
nates. 


Stereotyped Attitudes and Psychological Dis- 
tance 


The present research findings confirm 
aspects of earlier research. In support of 
earlier hypotheses concerning stereotyped at- 
titudes, Ss assigned roles as supervisors and 
subordinates, and serving merely as observers, 
demonstrated the same rating disparity as 
supervisors and subordinates who expected 
to interact and discuss the ratings. An addi- 
tional finding, showing that observers rated 
more negatively than participants, suggested 
that as the role a person assumes places him 
at a greater psychological distance from a 
person to be appraised, his ratings become 
more negative and critical. In this respect, 
psychological distance and stereotyped at- 
titudes appear to supplement each other as 
explanatory concepts in describing the dis- 
parity in rating attitudes between supervisor 
and subordinate and between observer and 
participant peers. 


Traits Appraisal and Subordinate Vulner- 
ability 

A final insight into the dynamics of the 
traits interaction was obtained by comparing 
public and private traits-performance-ap- 
praisal interactions. Supervisors reported no 
noticeable emotional effects as a function of 
being observed. By contrast, subordinates ap- 
peared to have felt under considerable ten- 
sion and discomfort as a result of being ob- 
served and reacted with greater resistance to 
their supervisors. Apparently, then, in a 
traits interaction, it is the appraised subordi- 


nate who feels exposed, sensitive, and vulner- 
able, while the appraising supervisor, com- 
fortably fortified by his superior position, is 
relatively safe, secure, and insensitive. 

One might argue, of course, that the role- 
playing techniques used in the present paper 
do not produce results from which we can 
generalize to real-life situations. The “stakes” 
are smaller during role-playing experiments; 
accordingly, the feelings which are aroused 
are much milder. In the present study we 
suspect that one could make far more dra- 
matic dramatizations to the work situation 
than we have dared. For example, if during 
the course of a role-playing traits interaction 
a subordinate reports that he felt somewhat 
hostile, we suspect that during an identical 
interview in an actual life situation he might 
have become bitter, extremely tense, red 
faced, and deeply threatened in a personal 
way. ; 

In summary, then, the authors feel that 
the differences found in these role-playing 
experiments capture the essential shape and 
form of what would transpire in an actual 
work situation, but understate the intensity. 
Nevertheless, actual experiments in real-life 
situations with supervisors and subordinates 
with actual work relationships are a needed 
extension of this literature. 7 


Traits and Goals Techniques and Psycholog- 
ical Distance . 


Previous role-playing comparisons of the 
traits and goals performance appraisal met - 
ods have indicated that both supervisors and 
subordinates react more favorably in the 
goals-oriented performance-appraisal inter- 
view. Not only did the goals method seem: 
more informative than the traits method, but 
it appears to create greater motivation for 
change in the subordinate (Rothaus et al., 
1962). Yet, the literature summarized by 
Fiedler (1960) raises questions as to the 
relative effectiveness of these two apprais 5 
techniques in actual practice. Each tech- 
nique’s effectiveness may depend upon the 
specific work situation and upon the defini- 
tion of “effectiveness.” Psychologically dis: 
tant supervisors (traits method oriented) are 
apparently more effective in producing and 
maintaining groups at high productivity 
presumably because they are able to criticize 
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ow production subordinates. To maintain 
1igh output, however, the supervisor must 
‘have sufficient power for his behavior ‘to 
nake a difference’ to the group [Cartwright 
x Zander, 1960, p. 505].” Likert’s research 
(1962), however, indicates that such critical 
supervisory techniques may only be success- 
ul for short-range objectives. Moreover, in- 
lirect costs of such supervisory styles can 
ccur in an organization due to industrial 
sabotage, absenteeism, large turnover, scrap 
oss, and breakdown of organizational ef- 
ectiveness in the absence of the supervisors 
Likert, 1956; Kahn & Katz, 1960). 

Kahn and Katz (1960), on the other hand, 
ummarizing field studies from civilian, 
nilitary, and industrial agencies, found that 
upervisors who had highly productive groups 
vith high morale allowed their subordinates 
reater freedom in the planning and formu- 
ation of their jobs, spent more of their time 
n the leadership role, especially the inter- 
yersonal aspects of the job, performed more 
upportive functions such as training, pro- 
noting, transferring, and communicating, 
ind had groups with greater cohesiveness. 
[he subordinates of these high-producing 
roups saw their supervisors as taking a per- 
onal interest in the men and their off-job 
roblems. In other words, high production 
vas related to a leadership role characterized 
vy low psychological distance. 

Fiedler (1960) has attempted to explain 
he inconsistent results from psychologically 
listant and nondistant supervisory styles by 
uggesting that psychologically distant leaders 
aay not be effective in all work situations. He 
uggests that reduced psychological distance 
etween supervisors and subordinates (such 
s occurs in the goals-oriented techniques) 
ppears necessary where the work to be ac- 
omplished is a creative endeavor involving 
he bringing forth of ideas, planning, and 
ecision-making, as well as the formulation 
f policies. In addition, a study by Carp, 
‘itola, and McLanathan (1963) found that 
here is an optimal psychological distance 
hat is characterized by the most effective 
saders. These leaders “score in the middle 
ange, being psychologically close enough to 
roup members to maintain contact and far 
nough to allow objective reactions to poor 
erformance,” 


Group-Centered Performance Appraisal 


One possible problem with the previous 
comparisons that have been made of the traits 
and goals approaches, however, is a social- 
psychological one. The problem resides in the 
fact that when actual supervisor-subordinate 
relationships in a traits appraisal occur in 
the context of a social and work group, they 
cannot be understood in terms of two-person 
dynamics. The rating which a subordinate 
receives from his supervisor is not an absolute, 
but rather a relative rating. Although the 
subordinate may see his own rating in rela- 
tion to the range of the rating scale, his re- 
action will depend, in part, on ratings received 
by his peers from that same supervisor, and 
by his own evaluation of his performance as 
compared to his peers. The meaning of a low 
or a high rating depends upon the scores and 
ratings others are receiving. In many work 
situations performance appraisal is made by 
the supervisor of all his subordinates at about 
the same time, and as a consequence members 
of the work group desire to know where they 
stand in comparison to the other subordinates. 
It seems inevitable and entirely natural that 
subordinates should desire an understanding 
of where they stand in the organization and 
in respect to their peers. 

The goals approach, as traditionally de- 
scribed (Blake & Mouton, 1961; Kelley, 
1958; McGregor, 1960), does not mitigate 
the difficulties described above. If a super- 
visor conducts private dyadic goals-oriented 
performance appraisals with each of his sub- 
ordinates, the problem of where the subordi- 
nate stands in reference to his peers still re- 
mains. What needs to be studied in far 
greater depth is the possibility of conducting 
goals-oriented performance reviews as a 
group function rather than as a supervisory 
responsibility, as suggested by Likert (1959). 
In other words, it should be possible to con- 
struct periodic performance appraisals in 
which the work group (as a group) works 
with each individual group member to estab- 
lish work targets and goals, and in reviewing 
the extent to which members have succeeded 
in accomplishing these goals. Such an ap- 
proach seems particularly reasonable when 
subordinates have functional relationships 
with each other and are mutually inter- 
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dependent. Previous discussions of the goals 
technique have emphasized the supervisor’s 
role as one of formulating organizational 
goals and integrating these organizational 
goals with the individual goals of the sub- 
ordinates in a joint effort. The supervisor 
may still serve the same function in a group 
goals-oriented appraisal discussion, allowing 
the subordinates to work out for themselves 
their areas of responsibility, present level of 
performance, functional relationship to the 
work goals, and ways and means to obtain 
these goals. 


SUMMARY 


Through the use of role playing, the 
present study investigated the effect of psy- 
chological distance upon the rating attitudes 
of supervisors and subordinates. The study 
also examines the reactions of supervisors and 
subordinates participating in a public- and 
private-traits-appraisal interaction. 

Eighty-four-women nursing administrators 
were assigned the roles of supervisor and 
subordinate and divided into two groups: 
36 observers and 48 participants. Participants 
were given appropriate role instructions for 
the appraisal interviews, and observers were 
told to assume that they were role peers of 
the participants and to observe the inter- 
action. Observers did not participate in any 
way themselves. Six of the 24 interviews 
were conducted in private, i.e., no observers 
were present. Both groups made initial traits 
ratings of the hypothetical subordinate to be 
appraised in the interview. Following the 
interaction, the participant supervisors and 
subordinates rated their reactions. 

The results clearly indicated that in the 
role-playing situation supervisors of both 
observer and participant groups were more 
negative in their initial traits ratings than 
were subordinates. In addition, observers were 
more negative in their ratings than partici- 
pants. The concepts of psychological distance 
and role stereotypes were used in explaining 
these results. When the traits rating inter- 
views were conducted publicly, the subordi- 
nates experienced a number of negative re- 
actions that were not evident in the private 
interviews. Supervisors conducting the ap- 


praisals reported no differences in their re- 
actions between the public and private inter- 
views. Discussion of the results was related 
to previous research on supervisory practices, 
psychological distance, and effectiveness of 
work groups. 
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This study assessed the instructional effectiveness of simple and complex forms 
of typographical cuing in both conventional and programed texts. A total of 
118, pretested, 8th-grade students read an 8th-grade history lesson and were 
later retested. Analysis of gain scores revealed that: (a) simple typographical 
cuing distinguishing core from enrichment content enhances the ratio of impor- 
tant to unimportant content learned without affecting the total amount learned; 
(b) complex typographical cuing distinguishing 5 categories of lesson content 
fails to increase learning of either core or enrichment content; (c) the pro- 
gramed or quizzed text is more effective than the conventional text; and (d) 
the effects of simple typographical cuing and programed quizzing appear inde- 


pendent and additive. 


This study examined the instructional ef- 
ectiveness of simple and of complex forms 
yf typographical cuing in both programed 
exts (requiring student responding) and 
onventional texts. Typographical cuing here 
efers to the use of heterogeneous typography 
o differentiate several categories of lesson 
ontent differing in importance, or differing 
n estimated difficulty in recall, or both. 

The various portions of a single lesson 
ieither require nor deserve the uniformly 
oncentrated attention of the reader. Certain 
assages may require rote memorization while 
thers may deserve only a cursory reading. 
Che reader is obliged to alter his style of 
eading accordingly. This is not an easy task. 
Sven persons who score high on standardized 
eading tests frequently have difficulty in 
dapting their reading in this way (Moe & 
Nania, 1959), 

Typographical cuing is intended to help 
he reader identify and distinguish various 
ategories of lesson content, thereby allowing 
iim to adjust his style of reading to the 
mportance and difficulty of each. In turn, 


1 This investigation was conducted by the Amer- 
van Institute for Research under Contract Nonr- 
077(00) with the Office of Naval Research, Leslie 
. Briggs, principal investigator. The study was con- 
ucted in cooperation with the Menlo Park, Cali- 
ornia, City School District, Franklyn O. White, 
uperintendent. The authors are grateful to the par- 
icipating students and school personnel and particu- 
uly thank William W. Fisher, Director of Instruc- 
ion, 

2 Now at Northern Illinois University. 
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the reader who so adjusts his style may be 
expected to learn proportionately more of 
what he reads. Klare, Mabry, and Gustafson 
(1955) have found limited support for this 
hypothesis. They investigated the effects of 
underlining important words in a training 
manual on reciprocating engines. They found 
that underlining assisted airmen of high 
mechanical aptitude, but hindered airmen of 
low mechanical aptitude. On the other hand, 
Hershberger (1963b) found complex typo- 
graphical cuing schemes distinguishing four 
and five categories of lesson content to be 
ineffective for both high- and low-ability 
fifth-grade readers. In order to insure un- 
familiarity with the reading material, Hersh- 
berger used eighth-grade lessons with his 
fifth-grade students. It is possible that had 
he matched the rated grade levels of his 
materials and subjects (Ss) that his results 
would have favored typographical cuing. In 
the present study the rated grade level of the 
lesson was matched to the grade level of the 
student. 

The purpose of the present study was to 
assess the effects of simple typographical 
cuing (distinguishing only two categories of 
lesson content), and complex typographical 
cuing (distinguishing five categories of lesson 
content) upon the reading of eighth-grade 
school materials by eighth-grade students. 
These effects were assessed using both con- 
ventional and programed texts. The latter 
incorporated self-evaluational test items al- 
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TABLE 1 


CuInG SCHEMES 














Typographical formats 
Content category 5c 2c 1c 
Core Content: 
New and unfamiliar Full caps, Lower case, Lower case, 
key words red red black 
Familiar key words, key Lower case, 
statements, dates, etc. red with red Lower case, Lower case, 
underlining red black 
Basic core statements Lower case, Lower case, Lower case, 
red red black 


Key examples and re- 
phrasing of key 
statements 


Enrichment Content: 
Nonessential statements, 
examples, interesting 

sidelights, etc. 


Lower case, black 
with red under- 
lining 


Lower case, 
black 


Lower case, 
red 


Lower case, 
black 


Lower case, 
black 


Lower case, 
black 





lowing the reader to assess periodically his 
knowledge of the core or essential content 
of the lesson and to remedy deficiencies. 


METHOD 
Subjects 


A total of 126 eighth-grade students from five 
different classrooms in two Menlo Park, California, 
Elementary Schools participated in the study. During 
the course of the study, which spanned several days, 
a number of Ss were lost due to absences from 
school. Results are reported for a total of 118 stu- 
dents for whom complete test data were obtained. 
The mean Reading Grade Placement (RGP) score 
(California Achievement Test) for this group of 
118 Ss was 9.39 years with a standard deviation 
of 1.48.3 

Ignoring classroom affiliation, the Ss were divided 
on the basis of their RGP scores into six experi- 
mental groups having comparable mean RGP scores. 
The RGP score means for the six groups ranged 
from 9.22 to 9.53 years. The six groups are listed 
and coded in Table 2 in terms of the type of 
experimental materials administered to each group. 
Table 2 also shows the size of each group. 


Materials 


The Lesson. A discursively written lesson (i., 
incorporating both core and enrichment content) on 


3 These RGP scores were 1-year-old at the time 
of the experiment; hence, the figures given above 
understimate the subjects’ actual reading potential 
by approximately 1 year. RGP scores were un- 
available for 20 subjects. In these cases, teacher 
estimates were used. 


the history of Texas was excerpted from New Ways 
in the New World (Todd & Cooper, 1956), an 
eighth-grade text approved by the State of Cali- 
fornia. Six parallel versions of the lesson were 
prepared using three typographical formats in each 
of two types of text. 

Type of Text. The lesson was prepared as a con- 
ventional text, (T), and as a programed text, (P). 


The programed text contained quiz sheets (approxi- 


mately one per one and one-half pages of text) 
which required the reader to evaluate periodically 
his learning of the essential, core content of the 
lesson, and allowed him to rectify deficiencies by 
selective rereading. Each quiz sheet contained sey- 
eral (approximately seven on the average) multiple- 
choice questions or incomplete sentences which S was 
instructed to answer correctly. The test items were not 
followed by formal confirmation or correction of re- 
sponse. If S did not know the-answer he was instructed 
to turn back and reread the pertinent material until 
he found the correct answer. This arrangement may 
be considered to constitute a special form of 
“adjunct program” (Pressey, 1960). These self-test 
items covered only the essential or core content of 
the lesson; there were no such items on the 
enrichment content of the lesson. 

Typographical Format. The lesson content was 
divided into five categories, each differing in level 
of importance or presumed difficulty. Four of the 
categories comprised the core, or essential content 
of the lesson. The fifth type of content, which 
comprised about two-thirds of the lesson, was called 
enrichment material. 

Each version of text (programed and conven- 
tional) was prepared in three parallel typographical 
formats: 5c, 2c, and 1c (c=categories). The three 
formats differed only in the number of lesson-content 
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categories distinguished through the use of hetero- 
reneous typography. Underlining, variation in size 
9f type, and variation in color of ink were used in 
format 5c to distinguish all five categories of lesson 
ontent. Variation in color of ink was used in 
format 2c to distinguish the essential from the 
mrichment content. Homogeneous typography was 
ised in Format 1c, a control format, to group all 
content categories into one. 

Table 1 briefly describes the five lesson categories 
ind shows for each category the style of typography 
ised in each of the three typographical formats. 

The 2c cuing scheme was explained in the reading 
nstructions prefacing the 2c versions of the lesson. 
Inly half the reading instructions for the 5c versions 
xplained the 5c cuing scheme in order to determine 
vhether instructions would help the Ss handle the 
nore complex 5c version more effectively. However, 
ince there were no differences in results between 
he Ss for whom the cuing scheme was explained 
ind those for whom it was not explained, this 
lifference in procedure may be ignored. 

Test. An objective test containing 44 multiple- 
hoice and completion items was constructed. Thirty- 
hree items covered the core content of the lesson; 
he remaining items covered the enrichment content. 
The test was designed so that each independent and 
initary bit of information tested and correctly 
mswered was given a score of one. The test was 
ot scored for spelling. 


-rocedure 


Each of the six groups described above read a 
lifferent version of the lesson. Three groups read 
rogramed versions, P, and three read conventional 
ext versions, T. For each of these sets of three 
me group read the 5c typographical format, another 
he 2c format, and a third the 1c format. With the 
xception noted above, each of these six types of 
2sson booklets was prefaced by reading instructions 
yhich explained in detail the proper use of any 
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self-test items or typographical cuing appearing in 
the booklet itself. 

Each S was tested twice, once 7 days before 
reading the lesson (pretest) and again 1 day after 
reading the lesson (posttest). The same test instru- 
ment was used in both testing sessions. 

The experiment was conducted concurrently in 
all classrooms. Regular classroom teachers adminis- 
tered the experimental materials and recorded lesson- 
reading times for the Ss. 


RESULTS 


Separate core- and enrichment-gain scores 
were obtained for each S by subtracting his 
score on the core and enrichment items of 
the pretest from his corresponding two scores 
on the posttest. 

Table 2 shows the mean core-gain scores, 
the mean enrichment-gain scores, and the 
mean lesson-reading times for each of the six 
experimental groups. (As a result of an ad- 
ministrative oversight on the part of two 
teachers, lesson-reading times were obtained 
for only a portion, , of the Ss in each experi- 
mental group. Table 2 shows the value of 
as well as the total N for each group.) 

Figure 1 shows the total (core + enrich- 
ment) gain-score means for each group plotted 
against mean reading time. 

Table 3 summarizes a three-way (Typo- 
grapical Format X Type of Text x Type of 
Content, core versus enrichment) analysis of 
variance (Lindquist, 1953) of the gain scores 
for the six groups. Two main effects (Text and 
Content) and two interactions (Text <X Con- 
tent, and Format X Content) are statistically 


TABLE 2 


MEAN GAIN ScorES AND READING TIMES FOR EACH EXPERIMENTAL GROUP 











Mean gain scores Reading times 





— (minute) 
Enrich- 
Group Code N Core ment M n® 
Programed text 
Format ic P-ic 15 23.87 3.47 29.00 7 
Format 2c Pc 16 26.06 1.56 24.57 if 
Format 5c Pe5c 30 19.67 2.47 24.43 16 
Conventional text 
Format ic T-1c 13 15.62 3.69 14.80 10 
Format 2c Me2e 16 16.56 2.13 13.58 12 
Format 5c ac 28 15:50 3325 16.10 20 





* As a result of an administrative oversight, reading, times were recorded for only a number , of subjects in each group. 
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Fic. 1. Total gain score means for the six 


experimental groups plotted against mean reading 
times. 


significant, p< .01. The means yielding the 
first three effects are shown in Table 4. 
The core-gain scores are greater than the 
enrichment-gain scores; * the programed text 
is superior to the conventional text; and, the 
self-evaluational test items of the programed 
text increase the core-gain scores but have a 
negligible effect upon the enrichment-gain 
scores. 

The means yielding the significant Format 
x Content effect are plotted in Figure 2. The 
interaction is such that, in comparison to 


4The reason for this is simply that there was 
much less enrichment material to be learned: the 
criterion test contained only 11 enrichment items 
as compared with 33 core items. 


TABLE 3 


THREE-WaAy (FormMAT X TEXT X CONTENT) ANALYSIS 
OF VARIANCE OF GAIN SCORES FOR ALL SIX EXPERI- 
MENTAL GROUPS 





Source df MS F 
Between Ss 117 
Format 2 59.90 2.08 
Text i 522.94 18.14** 
Peca 2 50.15 1.74 
Error (between Ss) db 28.83 
Within Ss 118 
Content 1 16,029.77 (ROT 
Cx f 2 110.24 4.82* 
Cet 1 737.90 32285" 
CX Exel 2 44.84 1.96 
Error (within Ss) 112 22.86 





*p <.01. 
** > Z 001. 


TABLE 4 


Gatn ScorE MEANS YIELDING THREE SIGNIFICANT 
Errects In ANOVA SuMmMMARIZED IN TABLE 3: 
Text, CONTENT, AND CONTENT X TEXT ’ 

1 








Content 
Enrich- 

Text Core ment Total 
Programed text 22.38 2.48 24.86 
Conventional text 15.86 3.04 18.90 

Combined 19.23 DAS 














Formats 1c and 5c, Format 2c, which incorpo- 
rates a simple form of typographical cuing 
helps the student learn proportionately more 
of the important core content, as opposed to 
the relatively unimportant enrichment con- 
tent. This interaction effect was shown by 
low-ability readers (having RGP scores less 
than the group mean of 9.39) as well as high- 
ability readers (having RGP scores greater 
than 9.39). 

Analysis of Reading Time. The mean 
reading times for the six experimental groups 
are shown in Table 2 and plotted in Figure 1. 
A three-way (Typographical Format X Type of 


25 


Core Content 
20 


Mean Gain Score 


Enrichment Content. 


6" 





LGas A268 5C 


Typographical Format 


Fic. 2. Mean gain scores for all subjects as 4 
function of typographical format and type of lesson 
content tested. = 
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Text X Type of Content) analysis of variance 
yielded only one significant effect. The pro- 
sramed text takes longer to read than the 
conventional text, F = 57.16, df = 1/69, p 
< .001. 

Learning Scores and Error Rates. The quiz 
sheets in the three programed versions of the 
esson (P-Ic, P-2c, and P-5c) were scored 
or errors and the results analyzed. A simple 
inalysis of variance yielded no significant dif- 
erences among the three groups, confirming 
he general notion that errors made while 
tudying the program are not necessarily 
yredictive of criterion-test scores. 

Combining all three versions of the pro- 
sramed text, the error rate for the lesson 
vas determined by dividing the mean number 
yf errors per S by the total number of errors 
y0ssible per S. This rate was .10. 


DISCUSSION 


Simple typographical cuing, distinguishing 
ore- from enrichment-lesson content, signifi- 
antly enhances the ratio of important to 
mimportant content learned—without  re- 
lucing the total amount learned—in reading 
oth programed and conventional texts. On 
he other hand, complex typographical cuing 
listinguishing five categories of lesson content 
loes not appear to benefit the reader in the 
east. In the latter case, it seems likely that 
he complexity of the typography may be- 
uddle the reader sufficiently to offset any 
dvantage derived from the cuing. Tinker 
nd Paterson (1946) have found that mixed 
ype forms as complex as the complex cuing 
cheme used in the present study are more 
ifficult to read than the common homogene- 
us typographical format. 

It is possible that, with practice, the reader 
aight eventually come to benefit from com- 
lex typographical cuing. However, Hersh- 
erger (1963a, 1963b) has repeatedly found 
© evidence for such a “learning to learn” 
rend in the performance of school children 
eading series of two or three lessons incorpo- 
ating complex typographical cuing. 

The present results demonstrate the marked 
uperiority of the programed or quizzed text 
ver the conventional or nonquizzed text. The 
rogramed text, as defined operationally by 


the experimental procedures used here (see 
also Hershberger 1963b), is not the same as 
conventional self-instructional programs, in 
which, incidentally, self-test items have typi- 
cally failed to facilitate learning beyond that 
produced by reading (Alter & Silverman, 
1963). There are several differences between 
conventional programs and the type of pro- 
gram used here. First, in contrast to conven- 
tional programs in which the testing of an 
item of information occurs concurrently with 
its instruction, the self-evaluational test items 
in the programed text used here were delayed 
test items which evaluated the retention of 
information the reader had been exposed to 
earlier in the lesson. Secondly, the  self- 
evaluational test items used in this study 
were not followed by formal confirmation or 
correction of response. Thirdly, if the reader 
could not properly respond, he was instructed 
to reread the appropriate section to find the 
answer, the response item being, therefore, 
primarily self-evaluational. The programed 
text used here more closely resembles, both in 
format and effectiveness, Pressey’s (1960) 
type of “adjunct program”—i.e., teaching by 
testing—, than Skinner’s (1958) type of 
linear program or Crowder’s (1960) type of 
intrinsic program. 

Adding self-test items has a much greater 
absolute effect upon the amount learned than 
does the addition of simple (2c) typographi- 
cal cuing. However, self-test items increase 
reading time whereas typographical cuing 
does not. In practical situations, therefore, 
the decision as to the use of self-test items 
requires a trade-off decision between learning 
level and study time. 

Since no interaction was observed between 
Type of Text and Typographical Format, the 
effects of simple typographical cuing and self- 
evaluational quizzing can be assumed to be 
independent and additive. That is, the ef- 
fectiveness of either technique does not 
depend upon the other. Each can be used 
effectively alone, or they may be used together 
to obtain the beneficial effects of both. 
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In November 1963, the Life Insurance Agency Management Association com- 
pleted a study of the effectiveness of a 1625-frame, self-instructional text on life 
insurance fundamentals, developed in cooperation with the American Institute 
for Research. The 2 X 2 X 3 factorial design involved: use of a review book, 
availability of confirmation for each response, and response mode. Over 1500 
people in 7 companies in classroom and field settings participated as well as 
home office employees in an 8th compxny to study 4- and 16-week retention. 
The use of a comprehensive review program resulted in slightly and reliably 
greater achievement on both immediate testing and retention tests 4 and 16 
weeks later. Trainees receiving confirmation of the accuracy of their responses 
were generally, and sometimes significantly, inferior in achievement to trainees 
not receiving confirmation. Trainees did not differ in achievement or attitude 
toward the program depending on writing, saying their answers, or choosing 
either of these response modes; those who said their answers took less time 
than the others. Pretest individual differences were not reduced by exposure 


to the programed texts. 


The purposes of this study were twofold: 
a) to investigate the applicability of the 
rogramed instruction technique to one phase 
f training in the life insurance industry, and 
6) to study certain variables, of both theo- 
etical and practical interest, considered 
elevant to the technique. 

Much of the research in programed in- 
truction has been done in centralized train- 
ng situations utilizing ‘captive’ trainees. 
iducational institutions, military training 
enters and, in the case of industrial studies, 
ompany schools have constituted the typical 
raining environments in which programed 
nstruction has been studied. See for example, 
sumsdaine and Glaser (1960). Such environ- 
nents are atypical of those found in the life 
nsurance industry. Decentralization is the 
ule with life insurance training, which typi- 
ally takes place in the agency on a one- or 
wo-man at a time basis. Such decentraliza- 
ion minimizes the control that can be 
xercised over both trainees and trainers. 

Any eventual application of programed 
onstruction in such an environment would 


1A version of this paper was read at the 1964 
ieetings of the Eastern Psychological Association 
nd the International Association of Applied Psy- 
hology. 

2The authors are indebted to S. Rains Wallace 
or his many contributions in the origination, design, 
nd execution of this research, as well as for his 
ditorial suggestions. 
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seem to require a demonstration of its ability 
to operate where training administration is 
minimal. Further, such an evaluation of pro- 
gramed instruction to determine its potential 
on an industrywide basis seems to require a 
realistic test situation in terms of the amount 
of training materials involved. The task con- 
fronting the trainee should be comparable, at 
least in scope, to a task confronting most 
agent trainees, namely, the acquisition of 
fundamental life insurance knowledge. 

In the present study three major independ- 
ent variables were examined in a 2 X 2 xX 3 
factorial design: response mode, the effects 
of confirmation, and the use of an overall 
review sequence. These were selected as the 
optimal combinations of variables to investi- 
gate, not only in terms of practical considera- 
tions relating to the adoption of programed 
instruction in life insurance training, but also 
in terms of theoretical considerations relating 
to both learning efficiency and programing 
technology. 

The first major variable studied involved 
the use of a final review sequence. Lumsdaine 
(1960), Gilbert (1958a), and Reynolds and 
Glaser (1963) discuss the utility of review. 
One question raised by the present study is 
whether the addition of an overall program 
review to a low-error rate program utilizing 
extensive repetition and seven major internal 
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review sequences will result 
acquisition and retention. 

The second major variable studied was 
confirmation. Confirmation is viewed here in 
the sense of feedback (Carr, 1960; Gilbert, 
1958b; and Porter, 1960) rather than as re- 
inforcement (Skinner, 1958). 

Research efforts examining confirmation 
have met with some difficulty in demon- 
strating the primacy of this variable. Studies 
by Holland (1960), Krumboltz and Weisman 
(1962a), and Fiks (1964) have failed to 
show any advantage of confirmation over no 
confirmation on posttest performance. Cysani, 
Glaser, and Reynolds (1962) have suggested 
employing prompting as a crucial operation 
which accounts for the learning associated 
with program sequences. In support of this 
approach, investigations by Evans, Glaser, 
and Homme (1962) and Hessert (1961) have 
demonstrated that subjects (Ss) learned as 
well from sequences containing highly 
prompted frames and no confirmation as they 
did from the usual combined prompt and 
confirmation frames. 

The present research is an attempt to 
examine confirmation in relation to a specific 
type of program—that is, a low-error rate 
(highly prompted), linear program. The ques- 
tion being asked here could be best phrased 
as follows: What are the effects of immedi- 
ate, highly discriminable, additional confir- 
mation upon the criteria, compared to the 
effects of the absence of such confirmation? 

Of the three major variables investigated 
in this study, a review of previous research 
seemed to indicate that greater effort had 
been expended by other investigators in 
studies of the response in programed instruc- 
tion. The evidence generated by studies uti- 
lizing various combinations of overt respond- 
ing, covert responding, and reading programed 
material in statement form with both im- 
mediate and delayed posttest criteria has 
been equivocal (Alter & Silverman, 1962; 
Evans, et al., 1962; Goldbeck & Campbell, 
1962; Hartman, Morrison & Carlson, 1963; 
Holland, 1960; Keislar & McNeil, 1962; 
Krumboltz & Weisman, 1962b; Silberman, 
Melaragno, Coulson & Estavan, 1961). 

It has been suggested that the use of brief 
programs, small samples, and inadequate cri- 


in superior 


P. WetsH, J. A. ANTOINETTI, AND P. W. THAYER 


teria have contributed to this dilemma 
(Hartman, et al., 1963; Keislar & McNeil, 
1962; Krumboltz & Weisman, 1962b). 

With respect to response mode, the present 
research attempts to address itself to two 
questions. The first question deals with the 
relative efficacy of several overt response 
modes—written, oral, and optional. From a 
theoretical standpoint, the question is: Does 
the greater activity involved in writing re- 
sponses result in greater achievement? From 
a practical standpoint, with a long program 
the written response might well result in 
greater consumption of time and training 
materials. The optional response mode was 
employed to determine whether the optimal 
response mode for any trainee would be a 
function of the unique learning skills he 


brings into the learning situation. That is, 
the best judge concerning the employment — 


of these skills might be the learner. 
The second question deals not so much 
with how much is learned, as it does with 


what is learned. Are differences between — 


program and conventional trainees attributa- 


ble to differences in content acquisition or 


do they, at least to some extent, reflect dif- 
ferences in amount of practice in making a 
particular kind of response? Lefkowith 


2 


(1955) showed that there is an interaction | 


between the testing method and teaching — 
method. As the stimuli become more similar, 
the test scores increase. It is suggested here — 


that responses, as well as stimuli, may 
produce the same relationship. 

A study by Alter and Silverman (1962), 
which included both written and oral con- 
structed response groups, utilized both con- 
structed responses and multiple-choice items 
in the posttest. Using a 87-frame program, 
they report no differences between the two 
groups on the criteria. However, Williams 
(1963) has shown superior performance by 


a constructed response group over a multiple- | 


choice group and two covert response groups 
on test items requiring S to respond with 
a technical term introduced by the program. 
The present research is an attempt to clarify 
these discrepant findings. 

Typically, previous research involving the 
three variables of interest here have studied 
their effects separately. Most factorial studies 
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hat have been reported (e.g., Alter & Silver- 
nan, 1962; Goldbeck & Campbell, 1962) 
lave been concerned with one of these vari- 
bles in combination with a variable not 
considered here. Fiks (1964) combined two 
yf the variables, response mode and’ con- 
irmation in a factorial study with a different 
hird variable, subject matter. He reports no 
ignificant treatment or interaction effects on 
he two common variables on a posttest. For 
he purposes of the present study, it was 
elt necessary to examine interaction effects 
is well as treatment effects. Consequently, a 
.x 2X3 factorial design was employed 
nvolving the following: (a) two levels of 
eview—with and without a final review 
equence; (0) two levels of confirmation— 
vith and without confirmation; and (c) three 
evels of response mode—written, oral, and 
ptional. 


MeEtTHOoD 


Materials. The program used was a 1,625-frame 
near, constructed response program on fundamental 
fe insurance knowledge, Steps Into Life Insurance. 
t consists of a 1,389-frame basic program, plus a 
36-frame final review sequence. The program was 
ritten by the American Institute for Research for, 
nd with the assistance of, the Life Insurance Agency 
fanagement Association. Complete details on the 
rogram’s development and a review of the de- 
elopmental procedures have been reported elsewhere 
Klaus, Shettel, Clapp & Welsh, 1961; Programed 
nstruction, 1962). Four successive groups of college 
‘udents, totaling 14, were used as trial Ss during 
rogram construction. The error rate after the third 
vision was less than 1%. 

Presentation Format. The program was prepared 
1 textbook form and was divided into three books 
f 692 frames, 697 frames, and the overall review 
quence of 236 frames. Two editions of the pro- 
ramed text were produced, one containing con- 
rming responses for each frame and the other de- 
oid of any such confirmation. The frame sequence 
roceeded from page to page horizontally, rather 
man down the page. Such an arrangement means 
iat the frame, response and confirmation are not 
isible while S views the next frame. On the con- 
rmation edition, the confirming response for any 
ven frame was located immediately to the right 
| the frame, concealed under a shield attached 
» the cover of the book. Instructions were pro- 
ided directing the Ss first to respond to the frame 
id then to slide the page out from under the 
lield for confirmation. For written responses, this 
mat permits S to view the frame, his response, 
id the confirming response simultaneously during 
mfirmation. Such a format does not permit the 


sequence of frames to be arranged vertically on the 
page and makes horizontal progression mandatory. 

Criteria. Four major criteria were utilized: (1) use, 
(2) achievement, (3) time, and (4) attitude. The 
use criterion simply asked the question of accepta- 
bility defined in terms of completion of the texts. 

Achievement was measured by a four-part exami- 
nation which consisted of 30 short-answer essay 
questions and 80 multiple-choice questions. The four 
subtests differed along three basic dimensions: 
(a) the nature of the response required, (b) the 
extent to which the language and terminology of 
the items were similar to those of the program, 
and (c) similarity of content to that of the pro- 
gramed text. Part 1 contained the 30 essay items. 
These were based directly on the program and 
closely resembled the “criterion frames” in a 
program sequence. Thus, Subtest 1 was similar to 
the program along all three dimensions. These 30 
items yielded a maximum score of 75 points. 

The items on Subtests 2, 3, and 4 were not 
constructed by the experimenters. These items were 
borrowed from a variety of existing tests of funda- 
mental life insurance knowledge. Since they were 
all multiple-choice questions, the response require- 
ment in the examination differed from that required 
using the program. The items on Subtests 2, 3, and 4 
differed from one another along the other two 
dimensions; that is, similarity in content and form 
with what was presented in the program. 

Subtest 2 consisted of 30 items covering material 
judged to have been covered by the program and 
written in language judged to be similar to the 
program language. 

Subtest 3 consisted of 25 questions covering con- 
tent judged to have been covered in the program 
but no regard was given to language or terminology. 
Subtest 4 consisted of 25 questions covering ma- 
terial the content of which was judged not to have 
been directly taught by the program and with no 
regard given to language or terminology. This group 
of items might be considered as “transfer” items 
of a sort; it was assumed that complete mastery 
of the programed material would permit a trainee 
to deduce the correct answers to these questions. 

The time criterion consisted of measures of both 
trainee and trainer time. This information was 
obtained through the use of special time logs re- 
quiring frequent entries relating not only to time 
spent on the program but also the time spent on 
related materials. In addition, the trainer was asked 
to estimate the number of hours he would normally 
spend with a man covering the material found 
in the program. Failure of many trainers to complete 
time logs forced reliance on their estimates in 
comparing conventional and programed training. 

Attitudinal information from both trainee and 
trainer was obtained through the use of short, 
opinion questionnaires which were completed subse- 
quent to completion of the program but prior to 
taking the achievement test. Both questionnaires 
consisted of both Likert-type items and free response 
items. The scaled items on the trainee’s question- 
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naire yielded a maximum possible score of 55 and 
a minimum of 11; the trainer’s questionnaire had 
a maximum score of 20 and a minimum of 4. The 
free response items on both questionnaires asked 
about likes and dislikes of the technique. These 
responses were classified and coded according to 
content. 

In addition to these criteria of achievement, time 
and attitude, a 20-item pretest was administered to 
all experimental trainees to determine level of 
knowledge prior to studying the program. This 
pretest typically could not be used with the reference 
group because in most cases they had already com- 
pleted their knowledge training. For all Ss, both 
experimental and reference, data were obtained on 
age, education, general aptitude when available (in 
three companies), experience, and other personal 
history characteristics. 

Subjects. Eight life insurance companies located 
in the United States and Canada participated in 
the study. In seven of these companies, new, in- 
experienced agents studied the program; in the 
eighth company, clerical and other employees in 
the home office who normally did not receive life 
insurance education studied the program. Among the 
seven agent companies, the number of program Ss 
varied from a minimum of 92 to a maximum of 261, 
yielding a total experimental group of 908 out of an 
original group of 1,064. This loss of 15% of the 
original agent sample resulted from 35 agents who 
failed to follow response instructions, 20 whose 
materials were lost or not returned, and 101 who 
did not complete the texts or supply necessary 
criterion data. Most of the last group were men 
who terminated contracts or did not accept contracts 
despite their managers’ expectations that they would. 
In the one home office study company 162 Ss par- 
ticipated. Thus, the grand total of program Ss being 
reported upon here is 1,070. 

It was impractical to develop a control sample 
in the precise sense of the term. Instead, 517 new 
agents in the seven agent companies who had re- 
ceived standard company training with respect 
to fundamental life insurance knowledge were subse- 
quently administered the achievement test and 
designated as a reference group. These agents typi- 
cally had more time under contract and in training 
than those in the experimental group. Subjects in 
the experimental group were eliminated from the 
study if they had 60 or more days of training; 
Ss in the reference group with up to 6 months of 
training were considered acceptable. These differ- 
ences were designed to make it more difficult for 
the program to show its effectiveness. The reference 
group varied by company from a low of 45 to a 
high of 113. 

Procedure. Within each company participating in 
the study, all 12 conditions specified by the 2 x 2 X 3 
factorial design were employed. Experimental Ss 
were assigned to a study condition in a random 
fashion. In six companies the study was conducted 
under field conditions with agents studying at home 
and/or in their agencies, the amount of material 
covered in a given study session being left up to 


the agent. In five of these six companies a man 
became eligible at or shortly after contract. In the 
sixth company, he became eligible as soon as his 
manager believed he would be coming under con- 
tract or had been appointed. In all participating 
agencies the managers were informed as to the 
nature and purpose of the study and were provided 
with a short description of programed instruction. 

In the seventh company the study was conducted 
in a home office school. Enrollment in the school 
as an inexperienced agent determined eligibility. In 
this situation agents studied in classrooms under 
a proctor’s supervision for assigned 2-hour periods, 
approximately 4 hours a day for 6 days. In the 
eighth company, using home office personnel, lack 
of formal training, permission of one’s supervisor, 
and availability determined eligibility. In this com- 
pany Ss studied in monitored classrooms for up to 
2 hours a week for 11 or 12 weeks, depending upon 
the study condition. In these latter two companies, 
a given number of frames was assigned and Ss 
could leave the classroom when the days’ or periods’ 
assignments were completed. The numbers of frames 
assigned in these sessions were based on a very 
conservative estimate of rate of progression so 
as to allow all trainees to complete the assignment. — 
Failure to complete the assigned frames in the time 
allowed was extremely rare. 

In the six field companies, distribution of all 
materials (textbooks, achievement tests, time logs, 
attitude questionnaires, etc.) were handled through 
the home office. Procedures were established to 
assure that the programed textbooks, time logs, and 
attitude measures had been completed and mailed 
to the home office before the achievement test was 
given to the agent. In this way, time between com- — 
pletion of the program and the administration of — 
the achievement test could be determined for each — 
agent and averaged 4 days. In the two classroom 
companies this interval could be controlled. For the 
home office agent school the elapsed time was 2 
days, and for the home office personnel group it was 
3 days. 

Procedural Variation. In the two companies in 
which use of the program took place in a classroom, 
Ss in the oral response condition were instructed 
not to say their answers aloud, but to say the 
answers to themselves and to “mouth” the words. © 
This variation was required to avoid chaos in the — 
classroom, ; 

In the company in which home office personnel — 
were Ss, it was feasible to do a retention we 


ee a 


This home office sample was composed of two 
groups of approximately equal size that studied the 
program over successive 12-week periods. Four 
weeks after the second group had completed the 


program and had been tested, the entire Sala 
completely unforewarned, was retested using the same 
achievement test. Thus, both 4-week and 16-week 
retention data were collected. Any Ss who had 
engaged in knowledge study either in formal study 
courses or informal readings were eliminated from 
the analysis. Out of the 162 original Ss, 136 are 


represented in the retention study, 66 in the 16-week 


Stupy OF PROGRAMED INSTRUCTION 


TABLE 1 


65 


COMPARISON OF THE STEPS TRAINEES WITH THE REFERENCE TRAINEES ON NuMBER oF DAvs or 
TRAINING AND ON NuMBER OF Hours oF RELEVANT INSURANCE EDUCATION AT THE 


Time THEY Took THE CRITERION EXAMINATION 





Company 

N 

A M 
o mean difference 
t 
N 

B M 
o mean difference 
t 
N 

Si M 
o mean difference 
t 
N 

D M 


Sa O05. 


o mean difference 
t 


N 

E M 
ao mean difference 
t 


N 

F M 
o mean difference 
t 


N 

G M 
o mean difference 
t 





Number of days 


Number of hours 
of relevant in- 








of training surance education 
Steps Reference Steps Reference 
trainees trainees trainees trainees 
260 86 242 75 
oa 24.2 44.6 Hes 
1.38 5.92 
6.60* 5.54* 
126 79 123 78 
15,7 106.2 36.2 66.9 
rol 6.64 
27.33* 4.61* 
93 29 90 14 
16.7 24.9 36.6 148.4 
2.74 3.64 
2.99* 30.76* 
ddl Hil 1S 76 
See 116.6 45.2 94.5 
Se) 1.02 
Dladie 48 .21* 
8&3 107 88 105 
22.9 20.2 40.3 50.0 
2.23 6.31 
et / 1255 
98 49 94. 49 
DOS 142.4 45.6 101.4 
5.62 8.55 
20.81* Goss 
110 Dt, 111 59 
51.0 44.2 S22 24.2 
4.98 2.84 
1.36 2.80* 





roup, and 70 in the 4-week group. Only 1 was lost 
ecause of intervening study, while 25 were lost 
ecause of illness, vacation, or termination of 
mployment. 


RESULTS 


Experimental versus Reference Trainees. 
s noted previously, agent trainees studying 
onventional materials were available for 
esting in seven of the eight companies. 


In comparing the reference trainees with 
the Steps trainees, the Steps trainees were 
combined disregarding the differences among 
them with respect to the experimental treat- 
ment received. Within each company, ¢ tests 
were computed to test the significance of the 
differences between the Steps trainees and the 
reference trainees with respect to age, educa- 
tion, and in three companies, intelligence. 
Since only 1 of the 17 ¢ tests (that 1 dealt 
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TABLE 2 
MEAN SCORES FOR THE STEPS AND REFERENCE GROUPS ON THE BASIS OF 
TOTAL SCORE AND FOR THE Four Parts OF THE FINAL EXAMINATION 
Criterion exam 
Company Total final 
(Parts I-IV) Part I Part II Part III Part IV, 
A Steps: M (c) 109.5 (16.5) 53.0 (9.2) 22:9 (3.5) 17:8. (G2) 0515S ace 
N for Steps = 261 Ref.: M (c) 102.3 (15.8) 45.7 (8.2) 22.6. (3:6) 17.3  (Gl0)\eto Smenrcem 
N tor Ref. = 86 Mean difference +7.2 +7.3 +0.3 +0.5 —1.0 
t 3.53* 6.57* 0.67 1.30 2.23* 
B Steps: M (c) 101.4 (20.2) 49.0 (41.5) 5 21:6 (4:1), 16.7 (G6) ae 
N tor Steps = 129 Ref.: M (ce) 88.1 (18.5) 37.2 (10.0) 21.0 (3:7) 14.7 (316) asta 
N for Ref. = 82 Mean difference +13.3 +11.8 +0.6 +2.0 —0.9 
t 4.83* 7.59* 0.95 3.95* 1.69 
Cc Steps: M (oe) 115.3 (16.9) 54.7 (8.8) 24.2 (3.6) 19.1 (3.2) hice (3.5) 
N for Steps = 94 Ref.: M (c) 107.5 (12.4) 48.5 GD GRAD ~ OB 18.3 (3.0) i178 Gay 
N for Ref. = 45 Mean difference +7.8 +6.2 +1.4 +0.8 —0.5 
t 2.76* 4.25* 2.28% 1.42 0.93 
D Steps: M (c) 102.6 (21.7) 48.3 (12.6) 22.0 (4.6) 17.2 (3.5), | 45.1. eenGiae 
N for Steps = 115 Ref.: M (c) 91.7 (17.8) 39.3 (9.1): 21.1 (3:9) 16.1 (3.7) etSeseetoee 
N for Ref. = 80 Mean difference +10.9 +9.0 +0.9 +1.1 —0.2 
t 3.69% 5.50* 1.45 2.22* 0.38 
E Steps: M (c) 118.8 (15.3) 56.9 (7.7) 24:2 (3.7). 19.7. _Gi3) ed SiO 
N for Steps = 92 Ref.: M (c) 102.3 (16.5) 44.0 (7.7) 22.5. ~~ (4.2), ,.< 18.1) (3:3) eee One 
N for Ref. = 113 Mean difference +16.5 +12.9 +1.7 +1.6 +0.4 
t 7.37* 11.97* 2.99% 3.30% 0.84 
F Steps: M (c) 11255 (14.9) 55.0 (8.3) DSs2 (3.6) 18.9 (2.8) 15.4 (3.2) 
N for Steps = 102 Ref.: M (c) 99.2 (13.6) 42.4 (6.7) 21.6 (3.2). 16.7 (3.2) 18:5 (iGymm 
N for Ref. = 51 Mean difference +13.3 +12.6 +1.6 +2.2 —3.1 
t 5.34% 9.40* 2.68* 4.23* 5.97* 
G Steps: M (ce) 118.4 (13.3) 57.8 (ZS) i239 (2S) ee 1 9; 2) 17.55 (3.0) 
N for Steps = 115 Ref.: M (c) 95.8 (12.7) 42.0 (6.5) 20.8 (3.0) 16.2 (3.0) 16.8) (3:59 
N for Ref. = 60 Mean difference +22.6 +15.8 +3.1 +3.0 +0.7 | 
t 10.84* 13.76* 6.80* 6.53* 1.39 





Note.—Ref, = reference. 
* pb <.05. 


with age) was significant beyond the 5% 
confidence level, it was concluded that the 
groups were comparable with regard to these 
variables, 

Table 1 shows the difference between the 
Steps trainees and the reference trainees with 
respect to number of days of training and 
hours of relevant insurance education at the 
time they took the final criterion examina- 
tion. In Companies A, B, C, D, and F the 
reference trainees had significantly more days 
of training and more hours of relevant in- 
surance education than the Steps trainees. 
In Company G, one of the two remaining 
companies, the Steps trainees had _signifi- 
cantly more hours of study than the refer- 
ence trainees. Only in this company were pre- 
test (quiz) scores available for the reference 
trainees as well as for the Steps trainees. 


Here, the reference trainees had a slightly 
higher pretest score than the Steps trainees. 

Thus, although the Steps and reference 
trainees differed little as to age, education, 
and intelligence (where scores were avail-— 
able), the number of hours and days of 
training of the reference groups generally 
exceeded that of the Steps groups, a condition 
which should favor the reference groups on 
the criterion examination. 

Table 2 shows the mean scores for the 
Steps and reference groups on the basis of 
total score as well as for the four parts of 
the final examination. Two points may be 
noted. First, the superiority of the experi- 
mental groups over the reference groups is 
attributable in large part to scores on the 
short-answer essay part of the examination, 
Part I, the part resembling criterion frames 
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if a program sequence. The differences be- 
ome smaller as the similarity of examination 
juestions and program material and format 
wecome less. Second, the variances of the test 
cores of the experimental groups are gen- 
rally as large or larger than the reference 
roups. Thus, the typically expected range 
estriction for program trainees did not occur 
1 any part of the examination, even in the 
hort-answer essay section. In addition, the 
ariances of program trainees were signi- 
cantly greater (p< .05) than that of the 
eference trainees on total final exam, Part I, 
nd Part II for Company C; on total final 
xam and Part I for Company D; and on 
art I for Company F. Interpretations of 
his finding might be ambiguous because the 
reater amount of training typically received 
y the reference groups might lead to greater 
omogeneity. However, an examination of 
he same distributions of Steps and reference 
roups shows no departure from normality 
xcept for some negative skewness on the 
ssay part of the posttest for Steps Ss. The 
uthors conclude, therefore, that there was 
0 practical restriction of range or reduction 
f variance as a result of exposure to the 
rogramed texts. 

As measured by our attitude questionnaires 
dministered only to the Steps groups, the 
ttitudes of trainers and trainees were ex- 
‘emely favorable, although such data may 
e heavily loaded by a Hawthorne effect. 
Hedberg* reports such a situation at The 
rudential Insurance Company of America 
1 which a new conventional course and 

scrambled program were both enthusiasti- 
ally received.) Both trainers and trainees 
ere also given the opportunity to express 
1eir dislikes of the new programed texts. 
lo single dislike was mentioned by more 
1an 9% of the trainees and 42% mentioned 
one, 

Forty-six percentage of the trainers also 
xpressed no dislikes. However, 19% com- 
ented on the loss of contact with the trainee, 
ot knowing what he had learned or where he 
as in his learning. Further indication of the 
ide and enthusiastic acceptance of these pro- 
ramed texts is evidenced by the fact that 


8 R. D. Hedberg, personal communication, 1963. 


89% of the men who used Steps completed 
the books. 

It was impossible to get reliable estimates 
of trainee time data for conventional study. 
Trainee data for the various experimental 
conditions will be discussed in a later section. 
Trainer times for conventional and programed 
study are based on estimates taken from the 
trainer opinion questionnaires since approxi- 
mately 40% failed to complete time logs, 
whereas 86% completed opinion question- 
naires. A substantial saving of trainer time 
is possible, from 8 to 13 hours on the average, 
depending on company and trainer practices. 
The amount of time spent by the trainer on 
Steps alone was approximately 2.5 hours. 

Experimental Comparisons. Plans to ana- 
lyze achievement test scores with covariance 
analysis using pretest scores were abandoned 
because the relationship between these two 
variables was somewhat curvilinear and 
because of a lack of homogeneity of achieve- 
ment test score variance within various 
levels of pretest scores. Analyses were 
conducted, however, which showed no sig- 
nificant differences in pretest scores related 
to the three main variables under study. The 
statistical analysis finally employed was 
Winer’s N-dimensional analysis of variance 
for cells of unequal Ns, Winer (1962). This 
procedure weights all cells equally. Because 
the actual number of cell entries within a 
company did not vary greatly, this equally 
weighted means analysis gave results which 
were quite similar to supplementary analyses 
in which cell means were weighted according 
to the actual number of cases in the cell. 

The equally weighted means analysis of 
variance was employed for several criteria 
including achievement subtests. Also, pretest 
scores (trichotomized) and company were 
each used as a fourth variable in the analysis 
of variance. 

Tests of homogeneity of variance performed 
for each company indicated some that were 
significantly heterogeneous. However, where 
this occurred it usually was the result of 
one cell with a large variance, and this one 
large cell variance resulted from one or two 
extremely low scores. As a further check on 
the validity of the analyses performed where 
there was a lack of homogeneity of variance, 


TABLE 3 


Experimental Groups Unweighted Means 
(Companies A, B, D, E, G) 


Pretest score 0-9 


























N = 194 
No No 
Rey. rev. Conf. conf. Write Say Choose 
Total final exam 102.7 99.0 99.3 102.4 102.7 98.3 101.6 
Part I 50.1 47.0 48.3 48.8 49.7 40.6 49.2 
Part II Zils ott 21.0 Dili BAe 21.0 21.4 
Part III 16.9 16.6 16.4 diez: We? 16.5 16.7 
Part IV 14.4 14.3 13.6 Stl 14.6 14.2 14.3 
Multiple-Choice 
Parts I-IV S27 52.0 51.0 53.7 53.0 Se, 52.3 
Pretest dee, 6.9 dee 6.9 7.0 7.0 del 
Agent attitude 50.6 48.4 50.7 48.3 49.0 49.3 50.2 
Trainer evaluation 18.0 17.8 17.9 17.9 17.6 Vid, 18.3 
Agent time: 
Books I & II lel 21.4 ZirS 2D 24.8 16.8 Dae 
Agent time: Books 
I, Il, & review 23.9 21.4 22.8 22.4 26.6 17.9 23.4 
Agent attitude Trainer evaluation 
interaction interaction 
No No 
Conf. conf. Conf. conf. 
Rev. 50.6 50.7 Rey. 17.4 18.5 
No rev. 50.9 46.0 No rev. 18.3 17.4 
Pretest score 10-13 
N = 299 
No No 
Rey. rev. Conf. conf. Write Say Choose 
Total final exam gaia 107.4 108.9 109.6 108.4 108.9 110.5 
Pana 54.6 Sid 52.9 52.8 52.9 52.1 53.6 
Part II 23.0 22.6 QTE 22.8 PPA 22.8 22.8 
Part III 17.6 18.0 iad 17.9 lS} 17.9 18.1 
Part IV 15.9 15.8 15.6 16.1 15.5 16.1 15.9 
Multiple-Choice 
Parts II-IV 56.5 56.3 55.9 56.8 55:5 56.8 56.9 
Pretest 11.5 11.5 11.4 11.6 11.4 11.6 1S 
Agent attitude 48.6 49.9 49.9 48.7 50.7 48.0 49.2 
Trainer evaluation 17.8 if 17.9 17.6 18.3 17.9 ital 
Agent time: 
Books I & II 18.2 19.5 18.5 19.2 22:1. 15,0 ae 
Agent time: Books 
I, II, & review 20.5 19.5 19.8 20.1 2353 15.6 21.0 
Agent time 
Books I & II 
interaction 
No 
Conf. conf, 
Write 20.5 23.7 
Say 15.6 14.5 


Choose 19.5 19.3 
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Table 3—Continued 








Pretest score 14-20 








Nia 
No No 
Rev. rey. Conf. conf. Write Say Choose 

Total final exam 121.8 118.7 117.6 122.9 122.6 117.8 120.4 
Part I 58.4 56.7 56.1 59.0 59.1 55.9 Se 
Part IT nL 24.7 24.6 25e1 Zoe 24.7 24.9 
Part III 20.0 19.6 19.5 20.1 20.1 19.4 19.9 
Part IV 18.4 ii 17.4 18.7 18.3 17.9 17.9 
Multiple-Choice 

Parts I-IV 63.5 62.0 61.6 63.9 63.6 61.9 62.7 
Pretest 15.6 15.6 15.6 15.6 15.6 15.4 15.8 
Agent attitude 49.8 47.9 49.1 48.7 48.5 47.1 51.0 
Trainer evaluation 17.3 17.7 17.6 17.4 ee 17.3 18.0 
Agent time: 

Books I & II 17E5 17.0 16.6 17.9 19.6 13.6 18.7 
Agent time: Books 

I, II, & review 20.3 17.0 18.0 19.3 21.2 14.5 20.1 
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_ Note.—The single underscore indicates that the obtained F ratio is significant beyond the 0.05 level; and the double underscore 
ndicates that the obtained F ratio is significant beyond the 0.01 level. 


scores were converted to stanines in several 
nstances where the variances were not homo- 
reneous. The analysis of variance results and 
sonclusions based on such stanines were the 
ame as with the untransformed scores. 
Cherefore, only the analyses based on un- 
ransformed raw scores are reported. 

Several analyses of variance were con- 
lucted: separate 2 X 2 X 3 analyses for each 
f the eight companies, combined 2 X 2 x 3 
malyses for the five United States field com- 
vanies, combined 2 X 2 X 3 X 6 analyses using 
sompany as the fourth dimension, 2 x 2 X 3 
< 3 analyses for five United States field com- 
yvanies using trichotomized pretest scores as 
he fourth variable, and separate 2 x 2 x 3 
malyses for each of the three pretest score 
evels. All analyses lead to similar conclusions 
o that only one set of tables will be pre- 
ented, that involving the means of the sev- 
wal criteria for major experimental condi- 
litions for each of the three pretest score 
evels. (Data for only five companies are 
resented because they represent fairly uni- 
orm administration conditions and the total 
tumber of items scored in the final examina- 
ion is the same across all companies.) 

Table 3 shows that the use of the review 


book following completion of the basic pro- 
gram generally resulted in superior perform- 
ance on all parts of the examination. Major 
differences appear, however, only in the means 
for Part I (the essay part) of the exam. The 
superiority of review is statistically reliable 
only for the middle pretest group, and is prac- 
tically small. The superiority of review is 
also consistent for each of the eight companies 
(not shown). 

As expected, the use of the review book 
adds trainee study time (see Agent Time: 
Books I, II, and Review) of about 2.5 hours 
in exchange for the slight gain in examina- 
tion scores. It is important to note, however, 
that this knowledge gain was maintained by 
the review group on the retest which occurred 
4 or 16 weeks later in the company using 
home office employees in the retention study 
(¢ = 2.34, N = 136, significant at the 0.03 
level). The test-retest reliability of the total 
final examination score for the 70 home office 
employees retested after 4 weeks was 0.94 
and that for the 66 employees retested after 
16 weeks was 0.91. 

The case for confirmation is less clear. 
An examination of the means shows that no 
confirmation usually makes no difference or 
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TABLE 4 


Experimental Groups Unweighted Means 
Pretest Scores 











Low Middle High p 
Total final exam 100.9 109.2 120.3 <.001 
Part I 48.5 52.9 5i9 <.001 
Part II le, 22.8 24.9 <.001 
RaTioetee 16.8 17.8 19.8 <.001 
Part IV 14.4 15.8 18.1 <.001 
Multiple choice (Parts II-IV) 5223 56.4 62.7 <.001 
Pretest 7.0 aes 15.6 <.001 
Agent attitude 49.5 49.3 48.9 not signf. 
Trainer evaluation 17.9 17.8 17.5 not signf. 
Agent time: Books I & II ZAES 18.8 eS <.001 
Agent time: Books I, I, & review 22.6 20.0 18.6 <.001 











is superior to confirmation. For both low and 
high pretest groups, the differences in scores 
on Part IV of the posttest are statistically 
reliable, although practically small. More 
interesting is the fact that for high pretest 
Ss, all final examination differences favor the 
no confirmation condition and the total and 
essay part of the examination yield reliable 
superiority for that condition. Agents and 
trainers show only slight preferences for the 
confirmation condition, although the prefer- 
ence of agents for confirmation is reliable in 
the low pretest group. In addition, there are 
significant agent attitude and trainer evalua- 
tion interactions with regard to confirmation 
and review for low pretest men (see bottom 
of Table 3). Confirmation is generally favored 
by these trainees, but no confirmation is 
acceptable as long as there is a review book. 
The combination of no review book and no 
confirmation is regarded much less favorably 
_ than the other three combinations. Trainers 
of low pretest men on the other hand, favor 
one condition or the other, review and no 
confirmation or no review and confirmation, 
although the difference in preference is slight. 
The time involved in studying under con- 
firmation versus no confirmation yields no 
consistent nor reliable differences. For the 
middle pretest Ss, however, the interaction 
for confirmation and response mode for agent 
time on Books I and II is statistically sig- 
nificant. It appears that men writing their 
answers take longer without confirmation than 


they do with it (see bottom, center of 
Table 3). 

There are no consistent nor reliable differ- 
ences with regard to final examination scores 
among response modes. Attitudinal differences 
are also inconsistent, although high pretest 
trainees prefer the optional mode, then 
writing, then saying, while trainers of middle 
pretest trainees prefer writing, saying, and 
the optional mode in that order. These dif- 
ferences are too inconsistent to make much | 
sense to the authors. As expected, writing — 
took longest, then the optional mode, and — 
then saying. These differences occur at all 
pretest levels, both for the two basic books — 
and all three books combined. Under field — 
conditions, when the trainees with optional 
instructions did not know of the other 
response modes, most trainees chose to write. 
In the classroom, where all trainees knew of 
all response modes, those with optional 
instructions usually said their answers, as 
this permitted them to leave the classroom 
sooner. 

The reader is reminded that the results in 
Table 3 refer to three separate analyses of 
variance, one for each pretest score level. 
These separate analyses were conducted to 
detect effects at a given level which might 
be masked in a combined analysis. The over- 
all means presented in Table 4 can serve as 
a basis for discussing differences among pre- 
test score levels. A four-dimensional analysis 
of variance was performed using the trichoto- 
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nized pretest scores as the fourth variable to 
est the significance of these differences. With 
egard to the final examination, differences 
among the means of all these groups for total 
und all part scores are statistically signifi- 
sant, indicating the preexisting differences 
were not erased by exposure to the self- 
nstructional texts. In addition, the examina- 
ion means increase with increasing pretest 
core level in an orderly fashion. The agent 
ittitude and trainer evaluation score differ- 
nces, on the other hand, are not statistically 
significant. 

Finally, the orderly increase in posttest 
cores with higher pretest scores is accompa- 
tied by an orderly and statistically significant 
lecrease in mean agent time. In brief, high 
retest Ss score higher on the posttest and 
ake less time to study the texts than average 
ss, who do similarly with regard to low 
retest Ss, 


DISCUSSION 


The data presented illustrate a number of 
nteresting points. As many studies have indi- 
sated (Hughes & McNamara, 1961; Holland, 
1959; Porter, 1959) programed learning 
“an teach as well as or better than conven- 
ional procedures with a substantial time 
saving, in this case the trainer’s time. Lack 
»f demonstrable trainee time saving is prob- 
bly genuine even though there were wide 
lifferences within and between company 
trainee time estimates. 

Of major interest are the differences be- 
ween experimental and reference groups on 
the subtests of the achievement criterion. 
Major differences between these groups in 
ach company were found on the short- 
unswer essay part of the test which resembled 
riterion frames. The fact that the differences 
ire greatest here and diminish in an orderly 
ashion as judged similarity of program and 
est form and content diminish, suggests the 
*xistence of a serious criterion problem. 

If, as is the case in many other studies, the 
sxamination had been restricted to short- 
unswer essay items resembling criterion 
rames, the results would have supported a 
nuch more favorable view of the program’s 
fficiency than actually emerged. The present 
study supports the findings of Williams 


(1963) and stresses the importance of sys- 
tematically varying criteria in format, style, 
and content in an attempt to discover not 
only how much, but what is learned. It ap- 
pears that experimental Ss learned some- 
thing about constructing certain responses in 
writing as well as acquiring knowledge of 
life insurance. 

Studies reported in the literature often 
show that programed learning results in 
higher criterion test performance and lower 
variance on the criterion test. Such a reduc- 
tion in variance was not found in the present 
study. In fact, the Steps trainees generally 
had somewhat higher variances than the refer- 
ence trainees. In two of the companies they 
had significantly higher variances on total test 
score and two subtest scores. It should be 
remembered that reference groups typically 
had more training which might account for 
the lack of difference in variance between 
experimental and reference trainees. Secondly, 
the Steps trainees were not the usual captive 
college students. And finally, the length of the 
program used, combined with the difficulty 
level of the criterion test, could have contrib- 
uted to the failure to obtain reduced variances 
among the Steps groups. In studies where 
reduced variances have been obtained, there 
may have been ceiling effects present in the 
criterion tests. In the present study, the per- 
centage of correct answers averaged about 
70% with no indication of any ceiling effect, 
although there were a few Ss who came close 
to achieving perfect scores. It is interesting 
to note that the few tryout, paid, college Ss 
achieved scores of about 95% on the essay 
exam. Our Ss fell far short of this mark. 

Another item of major interest is the analy- 
sis of experimental group findings. Despite 
careful program construction, low  error- 
rate, and seven internal, large-step review 
sequences emphasizing criterion frame _per- 
formance, trainees using the review book 
which recapitulated the entire program se- 
quence performed significantly better on the 
achievement criterion than did those without 
it. Even though the difference is small, the 
superiority was maintained over both 4- and 
16-week retention tests. The present authors 
feel that these findings do not raise serious 
theoretical questions concerning programing 
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technology, even though they might suggest 
the need for more temperate claims as to 
the impact of a program. Rather, they suggest 
what they show: a large-step, comprehensive 
review which emphasizes criterion behavior 
can be programed at a minimum cost and 
result in increased achievement. 

As stated previously, the confirmation 
versus no confirmation comparison is not a 
crucial test of the reinforcement assumption 
underlying programed learning. Confirmation 
as used here refers to highly discriminable 
confirmation which was present or absent 
depending upon assignment to an _ experi- 
mental condition. All Ss, however, were 
exposed to the “implicit confirmation” which 
exists in a sequence of frames as a criterion 
response is shaped. 

While intriguing, the general superiority in 
achievement of the no confirmation group on 
various subtests is practically small, and a 
large sample is needed to achieve statistical 
significance. The lack of a practical difference 
cannot be explained by a tendency of “no 
confirmation trainees” to reread frames to be 
sure of their answers to criterion frames be- 
cause of the failure to find reliable differences 
in time spent in studying between confirma- 
tion and no confirmation conditions. Although 
the present differences are minor, it is impor- 
tant to remember that trainees and trainers 
alike preferred the confirmation condition. 
Which of these differences would prevail 
under normal rather than experimental condi- 
tions is unknown. In any event, conclusions 
based on these results must be limited to 
programs having similar characteristics; 
large-step, high-error rate programs might 
require confirmation for efficient learning. 

With regard to mode of responding, the 
present findings support those of previous 
studies (Alter & Silverman, 1962; Fiks, 
1964). Apparently, oral or implicit respond- 
ing is as effective as writing. In addition, no 
advantage over oral responding accrues from 
permitting the individual to choose or vary 
his own response mode as he progresses. 
Again, the authors wish to point out that the 
failure to find differences in achievement were 
under conditions known to be experimental 
to the Ss. In an unsupervised, day-to-day 
setting, some trainees instructed to say their 


answers might be less conscientious. Whether 
any drop in achievement which might result 
would be offset by the time saved in the use 
of oral response is unknown. 

Finally, pretest differences were not re- 
moved by exposure to the program. 

Perhaps the most interesting feature of this 
study is the failure to confirm commonly held 
beliefs as to the importance of “crucial” 
programing techniques. Confirmation makes 
no difference nor penalizes the trainee, es- 
pecially high-pretest Ss. Review helps, some- 
what. Response mode yields little difference 
in achievement, but saying one’s answers 
saves substantial time. Pretest differences 
were maintained. Apparently, considerably 
more research is necessary to identify the es- 
sential characteristics of programed learning. 
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A CRITICAL NOTE ON GEIST’S “WORK SATISFACTION AND 
SCORES ON A PICTURE INTEREST INVENTORY” 


DAVID P. CAMPBELL 


University of Minnesota 


A recent paper in this journal reported, 
among other spectacular data, a correlation 
coefficient of .999 between a measure of 
occupational interest and one of occupational 
satisfaction (Geist, 1963). 

Any correlation of this size deserves careful 
study. This one in particular should be care- 
fully scrutinized, both because it is in an 
area of considerable practical concern and 
because similar correlations reported by other 
investigators have been moderate in size at 
best, usually around .2 or .3 (see, for ex- 
ample, Strong, 1955). 

Unfortunately, the results reported by 
Geist will not stand close scrutiny as the 
paper was laced with statistical errors of a 
most serious nature. Perhaps the most serious, 
or at least the most obvious, error was the 
presentation of a correlation of .748 between 
the Artist interest scale and a Satisfaction 
Scale C, “Changing Jobs” when it had 
already been reported that the standard 
deviation of this Satisfaction Scale C was 
0.0. Unfortunately, this error was “replicated”’ 
by the reported correlation of .832 between 
the Artist interest scale and a Satisfaction 
Scale D, “Comparison with other people on 
liking jobs” as this latter scale also, according 
to Geist’s report, had a standard deviation 
of 0.0. 


REPLY TO CAMPBELL 


HAROLD GEIST 
Berkeley, California 


To answer Campbell’s criticism, it appears 
that there were some problems with the 
Artist scale in my Inventory, which is one 
of the six scales discussed in the article. 
I computed the age at which each vocational 
group began to work for monetary recom- 


As both discrepancies appeared on the 
Artist scale, it should be informative to look 
more closely at the characteristics of the 
Artist criterion group. Geist reported that 
there were 64 individuals in the group, that 
their ages ranged from 21 to 26, and their 
median number of years in the occupation 
was 12! Thus it appeared that 50% of this — 
group entered their occupation before the 
age of 14. Perhaps not impossible, but most | 
curious. : 

The only other information given on the 
criterion groups was a footnote that “much 
of the sample was taken from directories of — 
‘Who’s Who’ in the respective occupations.” — 

Clearly no confidence can be placed in the ~ 
author’s other calculations, including that — 
correlation of .999, until the author explains - 
these discrepancies. 
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pense. The artist group began to work at 
age 14 and thus the relatively low beginning 
age. A correlation of .999 between scores 
of the artists on the Artistic scale and in- 
tensity of feeling, like-dislike is indeed a bit 
high and unusual, but such correlations ar 
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10t unknown. These correlations were com- 
uted by punch cards on a 650 computer and 
checked and rechecked. Campbell has con- 
used the two correlations on the Artist scale 
changing jobs and comparison with other 
yeople on liking job) which are correlations 
yetween the scores which people in various 
ecupations received (in this case the artists) 
m their “respective” scale and various satis- 
action-dissatisfaction categories (changing 
obs and comparison with other people on 


liking job). The standard deviations referred 
to were taken from Table 2; these values are 
not relevant to the correlations reported in 
Table 4. 

It should be apparent to anyone experi- 
enced in the area of test construction that 
unusual situations do occur (here, for ex- 
ample, .999 correlation) and in some cases 
they are not readily apparent, but come out 
in working and reworking the data. 


(Early publication received August 10, 1964) 
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The American Psychological Association, under a grant from the National Science Foundation, 
has been engaged since 1961 in a Project on Scientific Information Exchange in Psychology. An 
important series of studies has resulted. We now have available a detailed description of the way 
in which scientific information exchanges occur in psychology. The active scientist no longer relies 
essentially on published material to discover who else is working in his area of interest and what the 
status of that other work is, but develops an elaborate informal exchange procedure to keep abreast 
of developments in his field. Innovations in scientific information exchange have not dealt directly 
with this informal network, and so most innovations in scientific information exchange have not helped 
the scientist to learn quickly about work related to his own. 

A part of the difficulty posed to the active investigator is related to publication lag in journals. 
These lags have gradually been reduced, but they will never be so short that articles will appear as 
soon as the editing on them has been completed. Therefore the APA-NSF project has proposed to 
sponsor a series of innovations relating to APA journals. These innovations have been planned as 
limited trials and are wholly supported by the project for the purpose of evaluating their effects 
on scientific information exchange. 

The Journal of Applied Psychology will, during this volume year, cooperate in testing one such 
innovation. Beginning with this issue and continuing through the remainder of the year, the titles 
and authors of accepted papers are listed at the end of the issue. The address of one of the authors 
is also listed so that those persons who wish to communicate with authors about their work prior to the 
time that the article appears may do so. At the end of the year’s trial the results of this innovation 
will be assessed and further consideration will be given to the question of continuing this sort of listing. 


Manuscripts will be listed only after they are in a form which is considered acceptable for pub- — 


lication. Since the purpose of the listing is to enable active research workers to become aware of 
other persons working in related fields, we have asked that authors endeavor to give conscientious 
replies to inquiries that they receive. 


Manuscripts Accepted for Publication in the 
Journal of Applied Psychology 


Construct Revalidation of a Forced-Choice Rating Form: Seymour Levy* and D. Miriam Stene: Personnel Re- 


search and Manpower Development, The Pillsbury Company, Minneapolis 2, Minnesota. 

Development and Analysis of a ‘““(Cumshaw Tolerance’ Scale: Yvonne Treadwell: P. W. K. Dietrichson,* Open 
Literature Branch, Code 7515, Technical Information Department, U. S. Naval Ordnance Test Station, 
China Lake, California. 

Wit, Creativity, and Sarcasm: Ewart E. Smith* and Helen L. White: Serendipity Associates, 14827 Ventura Blvd., 
Sherman Oaks, California. 

A Language Aptitude Test for Blind Students: R. C. Gardner*: Department of Psychology, The University of 
Western Ontario, Middlesex College, London, Ontario, Canada. 

Group Performance as a Function of Task Difficulty and the Group’s Awareness of Member Satisfaction: Marvin E. 
Shaw* and J. Michael Blum: Department of Psychology, University of Florida, Gainesville, Florida 32603. 

Relationships Between the Importance and the Satisfaction of Various Environmental Factors: Frank Fried- 
lander*: Management Research Group (Code 172), U. S. Naval Ordnance Test Station, China Lake, California. 


Effects of Prodding to Increase Mail-Back Returns: Bruce K. Eckland*: Department of Sociology and Anthro- — 


pology, University of North Carolina, Chapel Hill, North Carolina. 


An Application of Psychological Scaling Methods to Content Analysis: The Use of Empirically Derived Criterion | 
Weights to Improve Intercoder Reliability: Ralph V. Exline* and Barbara H. Long: Center for Research on 


Social Behavior, Elliott Hall, University of Delaware, Newark, Delaware. 

A Comparison of Error in Five Sorting Procedures for Ordinal Ranking: Joseph M. Madden,* Joe T. Hazel, and 
Rodger D. Bourdon: United States Air Force, HQ 6570th Personnel Research Lab. (AMD), Box 1557, Lacks 
land Air Force Base, Texas. 

The Concept of Task Versus Person Orientation in Nursing: Allen Raskin,* Joan K. Boruchow, and Risa Golob: 
Psychopharmacology Service Center, National Institute of Mental Health, 6917 Arlington Road, Room 324, 
Bethesda, Maryland 20014. 


* Asterisk indicates author for whom the address is supplied. In addressing correspondence to authors it should 
be kept in mind that the manuscripts are listed in order of acceptance, i.e., the first 10 manuscripts will probably 
be published in the next issue of the journal and the last 8 or so manuscripts will be published a year hence. 
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AN INVESTIGATION OF THE CRITERION PROBLEM 
FOR ONE GROUP OF MEDICAL SPECIALISTS ? 


JAMES M. RICHARDS, Jr.,2 CALVIN W. TAYLOR, PHILIP B. PRICE, 
AnD TONY L. JACOBSEN 2 


University of Utah 


The sample consisted of 190 Utah physicians fully certified as specialists by 
an American Board. 80 scores relevant to the performance of these physicians 
were intercorrelated and factor analyzed using the principal components 
solution based on eigenvalues and eigenvectors. The 29 factors which had an 
eigenvalue greater than 1.00 were rotated by the varimax procedure and 
interpreted. The most important finding was the great criterion complexity 
for this group of medical specialists. This complexity suggests that one 
cannot adequately measure physician performance on the basis of a single 
score or a few scores. Instead, one must obtain a relatively large number of 
scores. Performance in both premedical and medical education was independent 


of performance as a physician. 


Over the last 10 years, studies evaluating 
he procedures used in the selection of medi- 
al students, and especially the Medical 
‘ollege Admission Test, have, for the most 
art, indicated only marginally satisfactory 
esults (Gottheil & Michael, 1957; Ralph & 
‘aylor, 1952; Richards & Taylor, 1961; 
‘aylor, 1950; Wantman, 1953; Wesman, 
959). However, these studies of selection 
rocedures have typically used as a criterion 
nly some measure of performance in medical 
ducation, whereas the ultimate goal in se- 
ction of medical students is to pick those 
andidates who will be successful as prac- 
icing physicians. Stalnaker (1951) in par- 
icular has criticized studies of selection pro- 
edures because the criterion measures lack 
ertinence to quality of performance as a 
hysician. Although the use of existing selec- 
ion procedures has sometimes been justified 


1 This research was supported by a contract be- 
ween the University of Utah and the Cooperative 
esearch Branch, United States Office of Education, 
roject No. 1551 (Robert Beezer, Monitor). 

2Now at American College Testing Program. 

3 The authors wish to express their gratitude to 
heodore M. Yellen, Gary Shirts, Gary Jorgensen, 
nd Sally Meik for their assistance on this project. 
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by their presumed relationship to qualities 
desirable in a professional practitioner, very 
few studies have been made of the relation- 
ship between the information used in the 
selection of medical students and criteria of 
postmedical school performance. In _ those 
studies which have been done, the results 
have been discouraging (Peterson, Andrews, 
Spain, & Greenberg, 1956; Richards, Taylor, 
& Price, 1962). There is, therefore, a great 
need for studies which do explore the rela- 
tionship between characteristics of the medi- 
cal student and his later performance as a 
physician, in order that selection and educa- 
tion of medical students may be based on 
appropriate variables. 

It is clear that before such studies can be 
carried out successfully, much research has 
to be done on the criterion problem itself. 
The present investigation, therefore, is one 
of a series of studies aimed at developing 
performance measures for physicians in vari- 
ous types of practice. The basic assumption 
of this series of studies is that physician 
performance is complex and multivariable; 
accordingly, the basic procedure is based on 
that used by Taylor and his associates 
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(Taylor, Smith, & Ghiselin, 1963; Taylor, 
Smith, Ghiselin, & Ellison, 1961) in an 
investigation of the criterion problem for a 
group of physical scientists. In the physical- 
scientist study, more than 50 different meas- 
ures of on-the-job performance of the scien- 
tists were factor analyzed to isolate 14 dif- 
ferent aspects of the productivity, creativity, 
and other contributions of these scientists. 
Similarly, in the present study, a large num- 
ber of measures were obtained of the on-the- 
job performance and accomplishments of phy- 
sicians practicing in urban areas in Utah 


who had specialized practices. These measures 
were then intercorrelated and factor analyzed 
to yield independent criterion dimensions. 

It should be noted that the sort of phy- 
sician studied in this research, the certified 
specialist, is increasingly the typical Amer- 
ican doctor. Fully half of the private practi- 
tioners in the United States today are 
specialists, compared to only 16% in 1931 
(Peterson, 1963). Since this trend toward 
specialization in many respects represents an 
adjustment to the very rapid expansion of 
knowledge in medical science and of complex 


TABLE 1 


TITLE AND SoURCE FOR EACH VARIABLE FOR URBAN SPECIALIST SAMPLE 











Title Source 

1. Number of Times Nominated as Outstanding Contributor by Urban Specialist Colleagues. . . . Colleagues 

2. Number of Times Nominated as Preferred Consultant by Urban Specialist Colleagues........ Colleagues 

3. Number of Nominations as Outstanding Contributor or Preferred Consultant by General Prac- 

Hitione|r sg... 1.14, 5.c. 20H sedi Mions niet laia cove, bia traps ORO enterae eke Aa i ke I SR naa NR araniaeS General Practitioners 

4. Number of Times Nominated as Outstanding Contributor by College of Medicine Faculty... . Medical Faculty 

5. Rating of ‘Clinical Excellence’ by Medical College Department Head..................... Department Head 

6- Number, of Different! Residency Hospitals ey eae ieee .Compendiums 

dn Num ber ot) Mears Spent im Resiclencysy- eae ara eater earn Compendiums 

82 Judged Quality of National (Board Certihcation ea aeeer ee eee eee Expert Judges 

os Present) College of Medicine| Clinical’ HacultysRania se) =e Official Records 
10. Mobility Rate in Professional Positions Since Receiving M.D.............................. Interview and Official Records 
11. Total Number of Listings in Honorary Compendiums............................------ Compendiums 
12°; Gross\ Income trombM edi call Profession eerie eet erie er eee ee Interview 
13. Number of Current Memberships in Scientific and Professional Societies.................... Interview 

14, Average Judged Quality of Societies in Which Membership is Current...................... Interview and Expert Judges 
15.).Overall Occupational Satisfaction <0). passe eee ee eee ac eee ee eee eee eee Questiennaire 


16. Number of Times During Career Invited to Serve as Editor of Scientific or Professional Journal. . Interview 
17. Number of Times During Career Invited to Serve on Scientific and Professional Advisory 


Boards... fi:s siesta sind)» oie, Sys ele soryafoetenstie elem sates Tee Siena anita Interview 
18. Average Judged Quality of Scientific and Professional Awards Received During Careers Interview and Expert Judges 
19. Self-Reported Number of Contributions Made to Medicine................-..0-0000002-.. Interview 


20. Average Judged Quality of Self-Reported Contributions Made to Medicine 


idtatnatetyene contacter Interview and Expert Judges 


21. Self-Reported Number of Non-Medical Contributions to Socletva ts cates Or igoe 7 aera ene Interview 
22. Total Number of Papers Presented at Scientific and Professional Meetings During Career... . Interview 
23. Average Number of Journal Publications Per Year Since Recelvinie (Mis pene ee) ee eee Interview and Compendiums 


24, Average Level of Contribution to Publications as Indicated by Senior vs. Junior Authorship 


StAtUs Fe Riaesieisccise.cicy state cee nicieree Oreo ee ee en 


sereseaer eae yA cP eee en oe Interview and Compendiums 


25. Number of Research Projects with Which Involved During Career......................... Interview 
26. Number of Scientific and Professional Journals Reviewed Regulatly* 110i ee SAS ees Interview 
27. Number of Subscriptions to Scientific and Professional Journals........................... Interview 
28. Number of Articles in Scientific and Professional Journals Read in Detail Each Month sn ont Interview 
29. Average Number of Society Meetings Attended Annually................................. Interview 
30. Number of Postgraduate Courses Taken During Career................................... Interview 
31. Number of Refresher Courses Taken During Career............0..000 0000.0 cece ceeeeee Interview 
32. Physician’s Evaluation of Usefulness of Drug Detail Men................................. Interview 
33. Extent of Physician’s Experimental Use of Drugs Provided by Drug Detail Men............ Interview 
34. Number of Techniques Other Than Journals, Meetings, and Drug Detail Men Used in Keeping 

ADT eASE A124 sini letciche stniaiel= Sieteleloecehe Wiese feletale YEE OTe Eee, «<n a a aL ‘Interview 
35. Average Number of Formal Medical Consultations Called into Monthly; 33.2. eee Interview 
36. Average Number of Informal Medical Consultations Called into Monthly-5...cepe ene a Interview 
37. Percentage of Patients on Which Consultations are Requested............................. Interview 
38. Number of Patients Seen) Per Day... Se. .udistka Pete isk eee aoe ee ae Interview 
39. Average Amount of Time Spent with Patients on First Visit.............................. Interview 
40. Average Amount of Time Spent in Explaining Diagnoses to Patients....................... Interview 
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TABLE 1—Continued 
Title Source 

mrroportion of Office Patients Treated Without Charges. .....<c0neseeccu ed sevvcnnegeesever Interview 
. Proportion of Office Patients That Fail to Pay Physician for Services Rendered............. Interview 
}. Self-Estimated Average Socioeconomic Level of Patients..............ccceveecccccueeccees Interview 
. Degree to Which Physician Adheres to Patient Appointment Schedule...................0.. Interview 
» Average Number of House Calls Made Per Week............cccccceccceccceecenecsuvceas Interview 
. Degree to Which Physician Considers Psychological Factors in Diagnoses.................-. Interview 
. Average Number of Hours Per Week Devoted to Medical Practice..............c0eeeeeeeee Interview 
Peevcrnpennieipen of Hospitalized’ Patientsc isis deeb ate clic owe bidd ee os Gwpleed ware Gace lene eieeels Interview 
fm vimberor fHospitals:in Which Physician Works. sc ..4 sod 0.000 cccuawed voesacewcwuaeseaecre Interview 
. Average Judged Quality of Hospitals in Which Physician Works.............c0cecececvcees Interview and Expert Judge 
- Number of Hospitals in Which Physician Maintains Courtesy Privileges..................-. Interview 
. Average Judged Quality of Hospitals in Which Physician Maintains Courtesy Privileges...... Interview 
- Number of Formal Responsibilities Physician has in Hospitals...............0.00cecevceece Interview 
» Average Judged Quality of Hospital Responsibilities... 00.1... ec ee eee cet e ce nees Interview and Expert Judges 
mev-Hetumated Value of Office Equipment J. Succ. acces cnc ccc wccwtvcwcecececvcccnwsecees Interview 
; Number of M.D. Assistants on Physician’s Ancillary Staff.......... 0. cc epececvcrcvccceces Interview 
mvumper of Nurses on Physician's Ancillary Staff. ..c.c.ccesecvecouscovetecscvevwrnceaves Interview 
foNumber of Lechnicians.on Physician’s:Ancillary Staff. ........ccececcccsiccacvunessusevens Interview 
. Number of Clerical, Administrative, and Janitorial Workers on Physician's Ancillary Staff... ... Interview 
. Average Number of Speeches on Medical Topics to Laymen Groups Per Year............... Interview 
mavetageramount of Vacation Taken Arintially.). csc. uncanctes acces ccs vecmouceomesseccons Interview 
. Extent to Which Physician Plans and Maintains Leisure Time Activities................... Interview 
. Number of Current Memberships in Social and Avocational Organizations.................. Interview 
. Number of Current Memberships in Civic and Political Organizations.................00005 Interview 
. Characteristics Physician Considers Important for Success in Medicine: Number of ‘‘Common” 

Caner SNORE MEMES RS SLIME Cw oie 6c Crates spk ate or guste SiG o; 415 sake sa puns aa acl ar amb e, Jha caus yall drake Interview 
. Characteristics Physician Considers Important for Success in Medicine: Number of ‘‘Uncom- 

PRMMMLCEMLISORCS MGT alters col ice cine hse cies sate na ete lale aur ca ats bas Soba alee suis ans Interview 
. Characteristics Physician Considers Important for Success in Medicine: ‘‘Commonness’’ of 

Sma ESSERE LLG CCL orotar a id. MA Se. SO cs onal etait sve et UPN cuss sw opetaseus ienaifeieie 1s: ates ©,005:@ wives aie ae Interview 
pete tee eee IEP eae yer AIT CCEA NV CCICING cats circ. 0 aieks shauiaa sions. cae sae mn eo Ter fect e oksradcuslevelo «adete Nuatane Interview 


Expert Panel Rating of Overall Performance Based on All Available Information............ Expert Judges 
Interviewer Rating of Condition of Physician's Office...... 
Sntenviewer-Rating of Likeability.........cc-ecscecserece 


Meester ciate riage ch screlje esl nresekacons, Project Interviewer 
sae ORTON Ue te ced biel Way Project Interviewer 


. Interviewer Rating of Physician’s Involvement in This Project............0ceceeeeeeeeeees Project Interviewer 
TN Er eTB ECOG EL SCC oaV Lal): Teh vaca ete ote oath Angin, Wace) satelite ww covey Maree Cletakand Wrists sous ete. eistia aim asepairee Control Variables 
. Number of Years Between Receiving M.D. and Receiving National Board Certification. ..... Control Variables 
Pventm mr enerience-cince Receiving. M. Diasec, sjaie sider apelin sore uje testers isle we alie binunieeg sa eels Control Variables 
. Individual Practice Rather Than Group, Clinic or Hospital Practice..............000 eevee Control Variables 
. Hospital Practice Rather Than Individual, Group, or Clinic Practice...............0-0008> Control Variables 


hon 


Undergraduate Grade Point Average.............200eee: 
Grade Point Average for First Two Years of Medical School 
Grade Point Average for Last Two Years of Medical School 


scatters hetsters che etscatsrersel'syouslars saucy ice Official Transcripts 
a etetaareoscajeravaxereianciens ive|sle ug. aieinieve exe Official Transcripts 
inne Mate reitis vlaibiaia commaieaartis cone Official Transcripts 





edical technology, there is little reason to 
lppose that the trend will be reversed in 
e foreseeable future. 


MrtTHOD 


Criterion Measures. Eighty scores relevant to phy- 
ian performance, including 3 scores measuring 
rformance in education, were obtained from a 
iety of sources and were analyzed. The first 5 
these scores were based on colleagues’ opinion; 
ese included 1 score based on other specialists’ 
minations of the physician as an outstanding 
ntributor to medicine, 1 score based on nomina- 
yns by other specialists as a preferred consultant, 
score based on nominations by general practitioners, 
score based on nominations by the University of 
tah College of Medicine full-time faculty, and 
scored based on a rating of “clinical excellence” 


by the head of the College of Medicine Department 
corresponding to the individual doctor’s field of 
specialization. The next 6 scores were obtained from 
medical directories and other compendiums; these 
included 2 scores concerning residency training, 
1 score concerning specialty certification, 1 score 
concerning current rank, if any, on the University 
of Utah College of Medicine clinical (i.e., part time) 
faculty, 1 score dealing with professional mobility, 
and 1 score based on listings in honorary com- 
pendiums such as Who’s Who In America. The next 
57 scores were obtained during an interview with 
each participating specialist, including 1 score based 
on income, 2 scores having to do with society 
memberships, 1 score from a special questionnaire 
dealing with sources and degrees of occupational 
satisfaction, 49 scores based on answers to direct 
questions asked in the course of the interview, and 
3 scores dealing with the specialist’s “image” of 
success. The final 12 scores, obtained from hetero- 
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geneous sources, included 1 score involving ratings 
by expert judges of “overall performance” (as indi- 
cated by the information represented in the fore- 
going 68 scores), 3 ratings by the particular project 
researcher who conducted the interview, 5 “control” 
scores involving such variables as years of experience, 
and 3 scores measuring performance in under- 
graduate and medical education. The exact titles of 
the 80 variables and the sources from which they 
were obtained are listed in Table 1. Details of the 
system by which raw scores were derived are 
presented elsewhere (Price, Taylor, Richards, & 
Jacobsen, 1963). 

Subjects. The population under consideration con- 
sisted of 332 physicians practicing in the State of 
Utah who had passed an American Board Specialty 
Certification examination and who were practicing in 
the urbanized Ogden-Salt Lake City-Provo complex. 
Letters signed by the Dean of the University of 
Utah College of Medicine were sent to these phy- 
siclans requesting that they participate in this proj- 
ect, which participation primarily required that they 
grant an interview lasting approximately an hour. 
A second letter was sent to those doctors who did 
not respond to the first letter; physicians who did 
not respond to the second letter were dropped. 
Ultimately there were 190 physicians who agreed 
to participate. This smaller group was the sample 
actually studied in detail in this research. 

Since some ‘scores were available for both par- 
ticipating and nonparticipating physicians, it is 
possible to make a partial check of bias in the 
sample. The criterion scores which were available 
for the nonparticipating physicians were the two 
scores dealing with residency, the score concerning 
quality of board certification, the two scores con- 
cerning nominations by other specialists, the score 
based on Who’s Who listings, the score concerning 
current rank on the College of Medicine clinical 
faculty, and the three control scores of years of 
experience, age at which the MD was obtained, and 
the number of years between receiving the MD 
degree and obtaining specialty certification. On these 
10 scores, ¢ tests were made comparing the phy- 
sicians who did not participate with the physicians 
who did. Only one difference out of the 10 was 
significant at the 5% level; this difference indicated 
that the nonparticipating physicians held lower rank 
on the College of Medicine clinical faculty. The 


TABLE 2 


DISTRIBUTION OF Missinc ScorES TAKING 
EacH VARIABLE AS A CASE 











Number of 
Percentile missing scores 
10 4.75 
25 11.78 
50 18.05 
75 Soe29) 
90 50.35 





meaning of this difference is not entirely clear. 
However, since the other nine differences were not 
significant, it appears that the sample of 190 
participating physicians is not seriously biased. 

Procedure. Scores on all variables, with the ex- 
ception of the two dichotomous control variables 
dealing with type of practice, were converted to 
Normalized T scores (Guilford, 1956, pp. 494-501) 
with a mean of 50 and a standard deviation of 10. 
The next step in the data analysis was to compute 
the 3,160 intercorrelations among the 80 scores.4 The 
resulting correlation matrix was then factor analyzed, 
using the principal components solution based on 
eigenvalue and eigenvector analysis (Harman, 1960). 
Unity was placed in the diagonal cells of the cor- 
relation matrix, and all factors having an eigenvalue 
greater than 1.00 were extracted. These factors were 
then rotated to a final solution on the computer, 
using the varimax analytic orthogonal-rotational 
solution. The rationale for this method of factoring, 
including the insertion of unity in the diagonal, and 
rotating is presented in detail by Kaiser (1960). 
While it is true that some users of factor analysis 
might prefer some other estimate of the communali- 
ties such as the multiple correlation between each 
variable and all other variables combined, it should 
be noted that when the number of variables is large 
the factor solution is relatively insensitive to different 
communality estimates. 

We have learned that some missing scores are 
inevitable in large-scale criterion research of this 
type. Since the factor analysis computer program 
used does not allow for missing data, the mean 
score, or 50, was subsituted for all missing scores. 
On an overall basis, approximately 13% of the 
scores were missing. However, missing scores were 
not evenly distributed over the 80 variables, and 
accordingly a more detailed description of the dis- 
tribution of missing scores over the 80 variables is 
presented in Table 2. It should be emphasized that 
the percentiles in Table 2 refer to the distribution 
of variables and not to the distribution of physicians. 
In other words, the tenth percentile in Table 2 refers 
to the 8 variables with the smallest number of 
missing scores. 

The identification number and name of variables — 
falling above the ninetieth percentile and the number 
of missing scores for those variables are as follows: 
12. Gross Income from Medical Profession (67 
scores), 40. Average Amount of Time Spent in 
Explaining Diagnoses to Patients (78 scores), 54. 
Averaged Judged Quality of Hospital Responsibilities 
(80 scores), 55. Self-Estimated Value of Office 
Equipment (55 scores), 74. Number of Years Be- 
tween Receiving MD and Receiving National Board 
Certification (68 scores), 78. Undergraduate Grade 
Point Average (120 scores), 79. Grade Point Aver- 
age for First 2 Years of Medical School (87 scores) ; 
and 80. Grade Point Average for Last 2 Years of 
Medical School (51 scores). 


4 All computations for this project were carried 
out at the Western Data Processing Center, Uni- 
versity of California, Los Angeles, California. 
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On an a priori basis, it is difficult to evaluate 
> exact effect of substituting the mean for a fairly 
ostantial number of missing scores, although it 
1 be stated that the general effect of such a 
ostitution is to reduce the correlation between 
riables, and that this in turn produces a tendency 
vard lower factor loadings and, in combination 
th unity in the diagonal, toward unique factors. 
order to provide a more specific estimate of the 
ent to which correlations for the present study 
re affected by the substitution of the mean for 
ssing scores, the correlations between each of the 
grade point average variables and the other 77 
res were computed, eliminating from the calcula- 
ns for each grade point average variable those 
ysicians for whom no grade average was available. 
ven the mean is substituted for missing grade 
res, the median correlation between the three 
de point averages and the other 77 variables is 
with a range of correlation from —.17 to .18. 
1en computations are based only on those cases 
whom grades are available, the median correla- 
n is still .03, but with a range of correlation 
m —.21 to .28. The distributions of the correla- 
ns both appeared fairly symmetrical, and were 
hly similar to each other with respect to the 
tern of correlations with the other 77 variables. 
the opinion of the authors, these results suggest 
t the results of this study were not materially 
ected by the substitution of the mean for missing 
res. 

m most cases, the raw scores on a scale were 
iply the numbers which the physicians gave in 
wer to a free-response question. Practical con- 
erations made it impossible to obtain a direct 
asure of the reliability (and, for that matter, 
idity) of these responses. An indirect indication 
the reliability is provided by the communalities 
ained in the factor analysis. From this indirect 
ication, it would appear that the reliabilities are 
least reasonably satisfactory. 


RESULTS AND DiIscussION 


A surprisingly large number of factors, 
mely 29, had an eigenvalue greater than 
0 and were included in the analysis. The 
ated factor matrix * is presented in Table 3, 
ether with the communality of each vari- 
le. These 29 factors are described below. 
Factor A has high loadings on all variables 
olving colleague nominations and the rating 


Copies of the complete correlation matrix and 
unrotated factor matrix have been deposited 
h the American Documentation Institute. Order 
cument No. 8221 from ADI Auxiliary Publica- 
is Project, Photoduplication Service, Library of 
ogress, Washington, D. C. 20540. Remit in ad- 
ice $1.25 for microfilm or $1.25 for photocopies 
| make checks payable to: Chief, Photoduplication 
vice, Library of Congress. 


of clinical excellence by the department head. 
Therefore, the most appropriate title for this 
factor might be professional recognition for 
achievements. High scorers on this variable 
have also attained higher rank on the clinical 
faculty, have been slightly more mobile than 
their lower-scoring colleagues, have read a 
slightly above average number of papers at 
scientific and professional society meetings, 
and have achieved some national recognition 
in that they have been asked to serve as 
journal editors. The overall picture presented 
by this factor is that of a physician who has 
achieved both a high degree of “visibility” 
and a good reputation and high status among 
his medical peers. 

Factor B is characterized by loadings which 
indicate that the high scorer sees fewer pa- 
tients each day, but spends more time with 
each patient, both in examination and in 
explaining his diagnosis. He also makes more 
use of consultations than the average specialist 
and relies less on drug salesmen as a source 
of information about pharmacological de- 
velopments than does the average specialist. 
A common thread running through these vari- 
ables is thoroughness in dealing with the 
patient and willingness to take as much time 
with each patient as is necessary. The best 
title for this factor, therefore, might be 
diagnostic thoroughness. Perhaps because of 
the smaller volume of patients, high scorers 
have a lower-than-average income. However, 
they work in fewer but better quality hos- 
pitals and are regarded as outstanding con- 
tributors. If this factor is reflected, the load- 
ings suggest a busy, financially successful 
physician with a large practice who con- 
tributes primarily through the quantity of 
people he reaches. 

Factor C might best be titled psychosomatic 
orientation, since its highest loading is on the 
variable measuring the degree to which the 
physician considers psychological factors in 
providing care to patients. It is interesting to 
find that high scorers are more satisfied with 
their professional activities than is the aver- 
age specialist in our sample. As might be ex- 
pected, high scorers tend to be in individual, 
group, or clinic practices rather than on a 
full-time hospital staff. Secondary loadings 
suggest that the higher scorer continues to 
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take, and apparently likes, courses in medical 
subjects, and is a member of an above- 
average number of professional societies. 

Factor D in many ways appears to in- 
volve the traditional “family doctor” pat- 
tern in that the most salient feature is the 
doctor’s willingness to make house calls. This 
interpretation is strengthened by the fact 
that the high scorer, when asked what his 
major contributions have been, tends to an- 
swer with things that are directly related to 
medicine, tends to spend an above-average 
amount of time examining patients on their 
first visit, makes relatively more use of con- 
sultants, and works a slightly above-average 
number of hours at his practice. Contrary to 
the findings of other researchers (Anderson 
& Feldman, 1956; Peterson et al., 1956), in 
addition to being a family doctor, the high 
scorer is also a “doctor’s doctor,” since he is 
nominated to a greater-than-average degree 
both by general practitioners and by other 
specialists, and his contributions are, ac- 
cording to expert medical judgment, of high 
quality. The characteristics common to many 
of these contributions seem to be a willing- 
ness to go out of his way when necessary to 
provide care to a patient; therefore, the 
title proposed for this family doctor factor is 
willingness to provide special attention to 
patients. 

Factor E should be reflected and titled 
medical charity work, since the described 
physician provides an above-average amount 
of free medical work, not only intentionally 
and voluntarily, but also involuntarily through 
the failure of a significant number of his 
patients to pay their bills. It could be, of 
course, the more of his patients do not pay 
their bills because this doctor is less con- 
cerned with payment and therefore does not 
press bill collecting. However, he does deal 
with poorer patients. Other loadings portray 
a physician who is relatively lax about. his 
appointment schedule, who practices in a 
group or clinic setting, who works in rela- 
tively few hospitals, who does not belong 
to many professional societies, and who takes 
more than the average amount of course 
work, 

Factor F, which is more clearly interpret- 
able if reflected and which primarily involves 
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the length and variety of residency experience, 
might best be titled length of residency. Doc- 
tors with long residencies appear to be the 
newer doctors, and surprisingly obtain their 
specialty certification relatively soon after 
receiving their MD degree. It is interesting 
to note that during residency the high-scoring 
physician tended to move from one hospital 
to another, but when he completed his resi- 
dency, he settled into a local practice and has 
since moved less than the average specialist. 

Factor G should be reflected and seems to 
be level of medical specialization and at- 
tainment, since its highest loading indicates 
that the physician is in a high-status specialty. 
Since passing a high status specialty is a 
prerequisite to membership in many high- 
status societies, there is, as expected, a high 
loading indicating membership in societies of 
above-average quality. As is also to be ex- 
pected, the high scorer had a relatively long 


specialized training. Moreover, the fairly 


high loading on typical number of hospitalized 
patients is also probably a function of the 
physician’s level of specialization. In other 
words, as specialization increases, the serious- 
ness of the diseases treated increases. Second- 
ary loadings appear for income, number of 
honorary compendium listings, and overall 
rating by expert judges. 

Factor H should be called quantity of hos- 
pital responsibilities, since it is character- 
ized by a relatively large number of hospital 
duties. As is to be expected, the high scorer 
devotes a substantially greater than average 
amount of time to his medical practice and 
takes a shorter than average vacation. In 
spite of working with poorer patients, 
he has an above-average income and ap- 
parently has plowed money back into his 
practice to equip his office and to provide a 
nontechnical staff. He is below average in 
national recognition. 

Factor I should be called quality of hos- 
pital responsibilities, since it is characterized 
by relatively high-quality hospital duties. In 
addition to his hospital duties, the high scorer 
has a sizeable nursing staff to assist him. 
Perhaps because of this assistance, he is 
efficient in utilizing his time so that he keeps 
on schedule. Apparently he has wide contact 
with general practitioners, who respect him 
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uighly. He utilizes the newest drugs 
wailable and deals to a slightly above- 
Verage extent with the psychological side of 
atients in that he makes certain his patients 
inderstand his diagnoses. 

Factor J, which should be reflected, rep- 
esents a physician who has some younger 
loctors working under him, who consults, 
dits, and belongs to higher status societies, 
nd who has achieved some national recogni- 
ion. He tends to work in relatively few hos- 
itals. Thus the overall impression is that of 
doctor who has achieved high status through 
ssentially administrative or supervisory con- 
ributions. Therefore, the best title for this 
actor might be medical supervisory respon- 
bilities. 

Factor K might be best titled medical con- 
ulting and liaison, since, if the factor is re- 
ected, the high scorer claims that other 
hysicians frequently ask his judgment on 
1edical problems, especially informally but 
130 on the basis of a formal consultation. He 
laintains a well-equipped office with a large 
ancillary technical staff. From this, one might 
et the impression of a well-established med- 
al specialist. However, other secondary load- 
igs show this type of physician is not only 
ounger than most of his colleagues, but also 
as frequently moved from one professional 
osition to another. 

Factor L, which might best be titled at- 
unment in research, gives a highly consistent 
attern, which can be interpreted as a strong 
‘ientation toward science. The high scorer 
as been involved in medical research which 
as been of a quality to lead to publication. 
e also serves on scientific advisory boards, 
livers papers at scientific meetings, and 
2eps up with what other investigators are 
ging by reading and reviewing the scientific 
erature. Since he has won both honors and 
her recognition for his contributions, it ap- 
ears that his research is significant. More- 
rer, his own self-concept seems to agree with 
ese findings. Specifically, he considers his 
ost important contributions to be those 
rectly related to medicine. In turn, these 
ntributions are regarded as high in quality 
expert medical judges. 

Factor M, which should be reflected before 
terpretation, might be called attainment in 


publications, since the two highest loading 
variables indicate that the high scorer has 
been productive of publications in which he 
has tended to be the sole or senior author. 
As a result of his contributions, he has been 
recognized as an outstanding contributor both 
by his colleagues and by the compilers of 
honorary compendiums, and he has been 
honored for his accomplishments. He tends to 
be in individual medical practice. 

Factor N is complex and somewhat diffi- 
cult to interpret. The highest loading in- 
dicates that when the high scorer is asked 
what his most important contributions have 
been, he tends to give a larger than average 
number of general contributions to society 
rather than strictly professional contribu- 
tions. However, what medical contributions 
he does list are regarded as important by 
expert judges. He tends to be practicing in a 
hospital, which perhaps is less of a drain on 
his energies than individual, group, or clinic 
practice would be, thus allowing him more 
freedom to make contributions as a respon- 
sible citizen. This interpretation is supported 
somewhat by the fact that he is active in 
community organizations. His overall attain- 
ment is considered good by expert judges. 
All in all, perhaps the best title for this 
factor would be self-evaluated contributions 
to society. 

Factor O clearly involves self-evaluated 
overall professional success, since its only 
really substantial loading is on the self-rating 
of success, and since the secondary loadings 
do not fall into a highly consistent pattern. 
The most striking thing about this factor is 
that there is nothing outstanding about high 
scorers except that they rate themselves as 
being quite successful. In terms of other 
tangible evidence, therefore, their self-con- 
cept of success seems somewhat exaggerated. 

Factor P, which seems to involve primarily 
the source to which the physician goes to ob- 
tain information, should be reflected and 
titled keeping abreast of field by journal 
reading. This tendency to go to original 
sources suggests that the high scorer is a 
cautious, critical-thinking specialist whose 
self-discipline is such that he exerts extra 
effort to be thorough in all phases of his 
medical practice. This impression is supported 


88 RICHARDS, JR., TAYLOR, PRICE, AND JACOBSEN 


by the fact that secondary loadings indicate a 
physician who takes time to make a thorough 
examination of patients on their first visit, 
and who works in the better hospitals. He also 
belongs to more and better societies than the 
average specialist, has achieved both honors 
and recognition, and is rated as above average 
by expert judges. There is thus some sug- 
gestion that physicians who use journals as 
their means of keeping themselves informed 
display an above-average level of perform- 
ance in a number of different areas. 

Factor Q, another source-of-information 
factor, specifically relates to postgraduate edu- 
cation and indicates either doctors who take a 
relatively large number of formal courses for 
credit and take a relatively small number of 
refresher courses, or vice versa. The physician 
taking a large number of refresher courses 
deals with wealthier patients, but has been 
less mobile and is working in low-status hos- 
pitals. An appropriate title for this factor is 
keeping abreast by means of refresher courses. 

Factor R, still another information source 
factor, should be titled keeping abreast by 
means of detail men, since the high scorer 
on this factor utilized both drug salesmen 
and the newest drugs provided by them as 
resources. Secondary loadings suggest that he 
is a preferred consultant, and he is in either 
individual or hospital practice rather than 
group or clinic practice. 

Factor S, which should be reflected, might 
be titled keeping abreast by means of un- 
common techniques, since the highest loading 
indicate a doctor who uses less commonly 
utilized techniques for keeping up with 
changes in his field. A striking example of 
such a technique is tape-recorded summaries 
of new knowledge in a given specialty which 
the physician can play while he is shaving, 
driving his car, etc. The high scorer has a 
less impressive office but more clerical help 
than average, and contrary to what might 
be expected, has attained some honors and 
some recognition among general practitioners. 

Factor T clearly involves participation in 
professional societies, since the outstanding 
characteristic of high scorers is that they go 
to many society meetings. They also derive 
satisfaction from their professional activities 
to a greater than average degree. Secondary 


loadings indicate that they place value on 
taking enough time to insure that patients 
understand their diagnoses, but still maintain 
their appointment schedule; work with poorer 
patients; have an average reputation among 
their specialist colleagues and below-average 
among general practitioners; and receive some 
requests to speak and to do editorial work. 

Factor U should be called civic participa- 
tion. Its three highest loadings all deal in 
one way or another with contacts with lay- 
men. Thus, one major contribution of high 
scorers is that they provide a link between 
professional medicine and _ society. Other 
smaller loadings form a complex pattern and 
indicate that the high scorer, in a self-report 
of his contributions, can and will point to 
many things that are above average, has been 
honored for his contributions, has courtesy 
privileges in below average hospitals, has a 
relatively small staff, and is slightly dissatis- 
fied professionally. Thus, there is a suggestion 
that the high scorer may be somewhat frus- 
trated in his profession, and perhaps has 
turned to contacts with laymen as an alterna- 
tive source of satisfaction. 

Factor V represents a physician who organ- 
izes his time to provide opportunities for 
leisure; he thus has a shorter than average 
workweek. There is some indication that this 
may be partly due to the fact that he works 
with less sick patients, who are less likely 
to require hospitalization. The best title for 
this factor might be leisure planning. Con- 
trary to the findings of Peterson, et al. 
(1956), there is little indication that high 
scorers are superior physicians. 

Factor W, if reflected, presents the picture 
of a physician who has achieved enough 
financial success to be able to afford long 
vacations. In view of this, it is somewhat 
surprising that he has an above-average num- 
ber of patients who do not pay their bills. 
However, he does have a relatively impressive 
office, and has achieved a slightly above- 
average reputation both among his specialist 
colleagues and among general practitioners. 
This factor might best be called need for 
long vacations—financial success. 

Factor X, involving mainly the impact the 
physician had on the person who interviewed 
him for this research, should probably be 
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titled interviewer’s rating of likeability. When 
the factor is reflected, high scorers seem to 
have better than average social skills and can 
move easily among people and can become 
deeply involved in an activity. 

Both Factor Y and Factor Z should be 
reflected, and both involve the doctor’s ideas 
about what constitutes success within his own 
specialty. Thus, they are perhaps not true 
criterion factors, although the authors had 
expected that a physician’s expressed ideas 
about what goes into success would have some 
relationship to his actual success. Factor Y 
represents a doctor who is quite typical in 
such an analysis and might therefore be 
titled orthodox success image. On the other 
hand, high scorers on Factor Z have unusual 
ideas about success, so this factor might best 
be called unorthodox success image. Other- 
wise, there is little that distinguishes high 
scorers on either factor from other specialists. 

Factor AA, which has its highest loading 
on a control variable, should be titled late 
attainment of MD degree. Smaller loadings 
show that high scorers on this factor work 
in relatively few hospitals and have relatively 
few patients in hospitals at any given time. 
They use drugs provided by detail men, and 
are in some form of group practice. They 
have achieved some national recognition, but 
are not asked to serve on advisory boards. 

Both Factor BB and Factor CC should 
be reflected. These factors appear to be 
measures of grade-getting ability, with Factor 
BB involving achievement in undergraduate 
education and Factor CC involving achieve- 
ment in medical education. In other words, 
to the extent to which one may generalize 
from the results of this study, performance in 
education is essentially independent of per- 
formance as a medical specialist. It does not 
appear that this finding is entirely due to 
the large number of missing scores for these 
variables, since as indicated previously the 
distribution of correlations for these variables 
was only slightly affected by the substitution 
of the mean for missing scores. Nor does it 
appear that it is an artifact of using an 
orthogonal rotation procedure, since the hy- 
perplane count for these variables is quite 
high. It should also be noted that recent 
research (Lindquist, 1963) indicates that the 


correlation between grades and other vari- 
ables is changed very little by elaborate pro- 
cedures for scaling grades to eliminate differ- 
ences resulting from varying grading stand- 
ards at different schools. 

A finding that performance in education is 
not much related to performance as a 
physician could be justified for undergraduate 
education on the basis of the traditional 
liberal arts argument that learning is an im- 
portant activity in and of itself, but such an 
argument would hardly seem applicable to 
professional education such as that provided 
in medical schools. Therefore, the authors feel 
that these results indicate a need for further 
research on the nature of medical education 
and on medical school grading policies. 

Before this investigation was started, the 
authors discussed it with numerous physicians 
in order to get their reactions and suggestions. 
A significant number of these physicians ob- 
jected to the whole idea of such a study, 
basing their objection on their view that 
success as a physician is quite complex. They 
stated quite strongly that one cannot have a 
meaningful measurement of physician per- 
formance on the basis of a single score or a 
few scores. The results of this study clearly 
indicate that such a view is correct, since the 
single most important finding, perhaps, was 
the great complexity of the criterion area. 
The fact that there was not a great deal of 
overlap generally, between the different meas- 
ures of physician performance indicates this 
complexity. Further proof is that one factor 
was found for approximately every three 
scores included in the analysis. In other» 
words, as soon as a large number of measures 
of performance is obtained, the real complex- 
ity of the criterion problem emerges forcefully. 

In this connection, it should be pointed out 
that the present study is probably a con- 
servative estimate of the real complexity of 
physician performance. In any first study 
such as this one, ways to insure the maximum 
cooperation of the group studied must be very 
strongly considered. Accordingly, in the pres- 
ent study, measures of some sensitive areas 
were intentionally omitted in the data col- 
lection. For example, no attempt was made to 
obtain patient reaction to individual physi- 
cians, and no effort was made to study 
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directly the quality of medical care provided 
by individual physicians through such tech- 
niques as the Medical Audit (Mortrud, 1953). 
While the Medical Audit technique was de- 
veloped primarily as a way of measuring the 
effectiveness of medical care provided by hos- 
pitals, instruments were developed which 
could be, and have been, used to evaluate 
some aspects of the effectiveness of individual 
physician performance. Since such measures 
are quite important, and since there is little 
reason to suppose that they overlap to any 
great extent with the measures used in the 
present study, an attempt should be made to 
include them in any followup studies of the 
criterion problem for medical specialists. 

To summarize, there are many different 
kinds of contributions to which a medical 
specialist can devote his energies and efforts. 
While a physician can focus his efforts 
toward a larger or smaller number of these 
kinds of contributions, it is unlikely that an 
individual can devote very much energy to all 
of the available kinds of contributions. As a 
result, the total amount of energy available, 
the selection of outlets for that energy, and 
the effectiveness with which one’s efforts are 
expended in those outlets are important in 
assessing the overall accomplishments and 
contributions of an individual. All of these 
considerations should be used in evaluating 
medical specialists. 
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“AN OBJECTIVE CRITERION FOR RESEARCH MANAGERS”: 
A CRITIQUE 


MELVIN R. MARKS 


University of Rochester 


Lamouria and Harrell (1963) describe an 
“operational analysis,” employing 


The OR emphasis .. . [in] a formal model used to 
combine various quantified factors relevant to the 
process being examined [to build an objective cri- 
terion for research managers]. [They assume] (a) 
Management evaluation using OR yields a more ob- 
jective and more accurate measure of performance 
than traditional clinical rating procedures. (b) The 
success of a research and development manager can 
be assessed entirely by measuring the performance of 
the group acting under his direction [p. 353]. 


The writer of this critique agrees with the 
first assumption, but finds that the model de- 
veloped by Lamouria and Harrell is neither 
necessary nor sufficient for their intended pur- 
poses. Further, the writer believes that the ap- 
parent scientific rigor of their approach is de- 
ceiving to the quantitative unsophisticate. The 
criticism will consist of four points. 

1. One class of criticisms rests on the irrele- 
vance of the authors’ elaborate weighting pro- 
cedures to their findings. The “criterion” em- 
ployed was.a ratio of actual to a “predeter- 
mined ideal standard” performance. The ratio 
was then summed in a double-weighting pro- 
cedure—once for objectives and once for ac- 
tivities—to achieve an overall index. The de- 
termination of the standard will be discussed 
further below. In this first point the weight- 
ing procedure will be examined. (a) If the 
performance ratios are simply summed (unit 
weights) for each department (and objectives 
are ignored) the values are 5.95, 1.67, 3.55, 
and 4.39. This order is identical with that 
found by the authors in their Tables 5 and 6. 
(5) Perhaps the procedure in a is not fair, 
since not all of the activities listed for each 
department were used by the authors; thus 
the sum could be inflated for a department 
which had more activities. For this reason, 
only those activities having more than zero 
relevance were used for each objective in each 
department. This time the sums of the unit 
weighted ratios were divided by the number 
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of activities summed to get a mean value. 
The resulting ordering by department (with 
objectives merged) again agrees perfectly with 
the authors’ Tables 5 and 6. When considered 
separately by objective, the agreement is per- 
fect for Profit and Diversification, and there 
are minor inversions for Growth and Welfare. 
One of these inversions will be accounted for 
below. (c) The authors’ Table 5 shows di- 
rectly that, using their double-weighting pro- 
cedure, the rank order of departments within 
each objective is identical and thus break- 
down by objectives need not have been used. 
There is a good reason for this finding. One 
can argue that the objectives of Diversifica- 
tion and Growth are highly correlated—or at 
least performance directed toward the one will 
also proceed toward the other. Further, both 
of these objectives are strongly related to 
Profit, perhaps negatively in the short run 
and positively in the long run. This interde- 
pendence of objectives refutes the justifiability 
of weighting the objectives to sum to unity, 
as though they were orthogonal. If we com- 
bine the Diversification and Growth scores for 
an overall mean of these two objectives, the 
inversion reported previously disappears. 

In summary of Paragraph 1, while the as- 
sumption of interaction between objectives 
and activities implied in the double-weighting 
procedure is a priori plausible, the authors 
have demonstrated empirically that it does 
not exist. Not only is the interaction unneces- 
sary, but so are differential weights for objec- 
tives, and for activities when compared across 
objectives. This finding for activities is curious 
since only four of the eight named activities 
were common to all four departments. 

2. Paragraph 1 has shown that the overall 
ranking of departments depends directly on 
the unweighted mean of the performance 
ratios. Examination of the procedure for com- 
puting these ratios shows that the numerator 
represents objective performance, while the 
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denominator, in contrast, ‘““was based on the 
subjective judgment of the department man- 
ager and represents a theoretical goal deemed 
both desirable and feasible [p. 356, italics 
added].”” Note that the OR results were con- 
trasted with the unit director’s clinical rank- 
ings. Thus the discrepancy could be due to 
differences between unit director and depart- 
ment managers in estimates of an ideal stand- 
ard for each of the activities. Two hypotheses 
are implied by the procedure used. First, that 
the rankings of the departments were sensitive 
to the aspiration level of their department 
managers. When aspiration is high, the per- 
formance index will tend to be low, and vice 
versa. Could the poor showing of Department 
B be attributed to his very high expectation 
as to what his department will be able to do? 
This leads to the second hypothesis. If the 
first hypothesis be true, could the unit direc- 
tor have rated Department B second on the 
list (even though it performed poorly) be- 
cause he recognized the need for such a high 
level of aspiration, and perhaps shared it? 
Note that Department B had not been in 
existence long enough to learn the rate of re- 
turn on its proposals, so the hypothesis is 
plausible. 

3. In their closing remarks, Lamouria and 
Harrell consider the influence on an objective 
index of factors beyond the manager’s control. 
They dismiss these after (a) conjecturing that 
a manager may be responsible for his group’s 
actions whether or not they are within his con- 
trol; (5) claiming that the unlucky manager 
can use the model in his own defense by show- 
ing how his superior’s acts and stated objec- 
tives are inconsistent. The writer believes that 
external influences cannot be ignored when 
personnel decisions are to be taken on the ba- 
sis of the computations. Three factors militate 
for their inclusion. (a) It is simply not con- 
ceivable that subordinate group means with 
respect to aptitude, training, experience, and 
personality will be equal in a natural setting. 
The manager with the better subordinates 
profits by the index. (b) The resources 
(money, equipment, etc.) available to each 


manager will seldom be equal, and the one 
with the best access will earn the best index, 
other things being equal. Of course, gaining 
the best access may show managerial ability, 
but this factor is not being measured directly. 
(c) Not all R & D problems are of equal diffi- 
culty. Progress and hence performance is more 
likely for easy problems. Lamouria and Har- 
rell have not considered this factor; however 
we note that differential difficulty expresses 
itself in the denominator of the performance 
index. The denominator, as shown previously, 
is partially self-serving for the department 
manager and also interacts with the person- 
ality of the manager’s supervisor to influence 
his clinical rating. 

4. Three minor points are mentioned only 
because the article laid such stress on quanti- 
tative techniques. (a) The last sentence of 
the Methods section gives the computation for 
Department A as an example. Unfortunately, 
four of the five values involved are not con- 
sistent with the authors’ Table 1 from which 
they were supposed to have been derived. 
This error does not influence the results. (d) 
In Table 2, the decimal equivalent of 1/15 is 
given as 0.667 instead of 0.0667. Again this 
does not influence the results; Department B 
is still low if a correction is made. (c) In 
Table 2 a weight of .1 is assigned to “Pro- 
posals,” for three of the objectives, although 
no performance ratio is available. The zero 
product systematically reduces the index for 
Department B. Again this does not influence 
the result since so small a weight is involved. 

In general summary, it is regretted that 
rather drastic claims have been made for an 
evaluation technique which are not justified 
by the supporting data. There is a need for 
comparison of objective and clinical tech- 
niques, but that need has not been well-served 
in the reported research. 
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DEVELOPMENT OF A PAINT SCHEME FOR INCREASING 
AIRCRAFT DETECTABILITY AND VISIBILITY * 


ARTHUR I. SIEGEL ann PHILIP FEDERMAN 


Applied Psychological Services, Wayne, Pennsylvania 


5 studies were performed in order to derive a paint coloration scheme which 
will allow maximum aircraft visibility and detectability. It was found that 
maximum visibility may be expected from a large, squarelike, unbroken fluores- 
cent red-orange area and a secondary area possessing color and brightness con- 


trast with the fluorescent red-orange. 


The frequency of midair collisions has 
focused attention on the problem of maxi- 
mizing aircraft detectability and visibility. At 
the present, appropriate collision avoidance 
by the pilot encompasses a series of behav- 
iors initiated on the basis of information 
received through the visual sense modality. 
Even with the development of electronic 
intruder warning devices, visual surveys of 
the sky by the pilot will still serve as a check 
on the information obtained from such de- 
vices as well as a check on the appropriate- 
ness of any evasive action taken. Thus, paint 
coloration schemes which will increase aircraft 
visibility and detectability are potentially of 
importance to the solution of the collision- 
avoidance problem. 

However, the development of optimum 
exterior paint schemes for aircraft should not 
be thought of as a panacea for solving the 
midair collision problem. The solution to this 
problem requires a systematic consideration 
of many factors, including: (a) the develop- 
ment of a manageable air traffic control sys- 
tem with skilled and trained air controlmen, 
(6) alert pilots who continuously scan for 
intruding aircraft and who quickly take the 
appropriate evasive action, (c) a “milieu” 
which emphasizes safety, (d) consistent ad- 
ministration of air traffic violations, (e€) more 
highly developed techniques for communica- 


1 This research was performed under Contract 
N156-38581 for the Air Crew Equipment Laboratory, 
Naval Air Material Center. We are indebted to John 
Lazo, Bertram Lowi, and Thomas Gallagher of 
that laboratory for advice and assistance throughout. 
We are further indebted to Kenneth Crain who 
assisted in the design and performance of certain 
parts of the work reported here and to Richard 
Lanterman who assisted in the development of the 
paint scheme discussed. 
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tion, (f) clearly defined rules and standards 
for collision-avoidance maneuvers, and (g) 
comprehensive training of all pilots in these 
maneuvers. While high visibility paints have 
been previously investigated (e.g., Cook, 
Beazley, & Robinson, 1962; Halsey, Curtis, & 
Farnsworth, 1955; Robinson, Cook, & Zeleny, 
1961), few, if any, studies have focused 
specifically on fluorescent paints to determine 
if they add to visibility and detectability and 
how they compare with ordinary paints in 
these respects. Additionally, most previous 
studies have emphasized color alone. Five 
different studies were performed and are 
reported here. The first three studies were 
performed in the Applied Psychological Serv- 
ices’ laboratory and involved investigation of 
such properties as color, pattern, area, and 
brightness contrast. The insights obtained 
from these laboratory investigations were 
checked in out-of-doors visual range studies, 
the fourth and fifth studies. 

Fluorescent paints or pigments, like ordi- 
nary pigments, reflect portions of the spec- 
trum of incident light and absorb other 
portions. However, with fluorescent paint, 


most of the absorbed portion of light is not dis- 
sipated as heat . . . but, instead, is transformed into 
emitted light of the same hue as that being reflected 
by the pigments. Reflected color is thus reinforced 
with emitted color, producing hues which appear 
extraordinarily bright to the eye of an observer 
[Switzer Brothers, Inc.]. 


Thus, if it is assumed, on the basis of the 
apparent brightness of fluorescent paints, 
that they reflect more energy, it can be 
expected that fluorescent paints will yield 
greater visibility than corresponding colors 
produced with ordinary pigments. 
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MeEtTHOD FOR LABORATORY STUDIES 


Apparatus 


The first study compared certain selected fluo- 
rescent and ordinary paints, in terms of the limits 
of their visual fields. Perimetric measurements were 
made on the Brombach perimeter (American Optical 
Company). Stimulus illumination was provided from 
a 60-watt daylight bulb housed on the perimeter. 
Since the lamp rotated on the arc of the perimeter, 
constant illumination on the stimuli (12.5 foot- 
candles) was maintained for all positions of the arc. 

The second experiment was performed to obtain 
basic information on the tachistoscopic thresholds 
for fluorescent and comparative ordinary paints. 
The apparatus employed was the three-channel, 
direct-viewing electronic tachistoscope manufac- 
tured by the Scientific Prototype Manufacturing 
Corporation. 

Several areas were investigated in the third study. 
First, the effects produced by changes in the extent 
of border contrast of the test object with its back- 
ground were studied. Second, the thresholds of sepa- 
rate chromatic stimuli of given total areas were 
compared with those of single, integrated stimuli 
of the same total chromatic areas. The third area 
investigated concerned the thresholds for dichromatic 
stimuli as compared with each of their mono- 
chromatic elements taken separately. Finally, the 
effects of varying stimulus shape (rectangular or 
square) and stimulus size (area) were investigated. 
The same apparatus used in the second study was 
employed in the third study. 


Stimuli 


Three fluorescent colors were selected for study: 
red-orange, yellow-orange, and blue. The chips from 
which these stimuli were cut were prepared and sup- 
plied by a manufacturer of fluorescent paint. Ordi- 
nary paint stimuli were used in order to establish 
a reference base against which the fluorescent stimuli 
could be compared. 

In selecting the comparison or ordinary colors, the 
assumption was made that those stimuli would be 
most. adequate which most closely approximated 
the saturation and hue of the fluorescents. These 
comparison stimuli should not be looked upon as 
controls, in the usual sense. Although there is some 
evidence (Halsey et al., 1955) that saturation is a 
more effective factor than brightness in contributing 


TABLE! 1 


MoNSELL NOTATIONS FOR THE COLORS TESTED 











Color Hue Value Chroma 
Fluorescent red-orange 6.0-6.8 R 5.8-6.5 20.0—25.0 
Fluorescent yellow-orange 7.5-8.0R 7.0-8.0 20.0-25.0 
Fluorescent blue eo us 5.0 10.0 + 
Ordinary red-orange 7.5R 5.0 12.0+ 
Ordinary yellow-orange 10YR 6.0 15.0 
Ordinary blue 2.5PB 4.0 8.0 





to detectability, it would be presumptuous, and not 
in line with the purposes of the present research, 
to control for brightness and allow saturation or 
hue alone to vary. The important factor here, is 
that the colors chosen as comparison stimuli ap- 
proximated the fluorescent standards (except for the 
effects of fluorescence) enough that they are as 
closely equivalent as possible. 

A large number of commercially available paint 
chips were obtained and several persons, skilled in 
the investigations of visual phenomena, agreed on 
the ordinary colors which best matched the fluo- 
rescents. This was done first under the illumination 
of a 60-watt Mazda daylight bulb situated 3 feet 
above the stimuli and again under the illumination 
provided by four 375-watt reflector flood lamps 
situated 5 feet above the stimuli. During the match- 
ing, both the fluorescent and the comparison stimuli 
were pasted on gray cardboard backgrounds possess- 
ing a reflectance of approximately 40%. 

Table 1 presents the Munsell notations for the 
stimulus colors selected. The notations for ordinary 
red-orange and fluorescent blue represent visual esti- 
mates made through the use of the Munsell Book 
of Colors. The notations for the remaining colors 
were obtained by the spinning-disk method and 
from the manufacturers of the paints. 


Subjects 


In the first study, perimetric measurements were 
made on nine male subjects (Ss), ranging in age 
from 18 to 27 years. All had normal color vision, 
as determined through the use of a standard series 
of pseudoisochromatic charts and normal visual 
acuity. Six male college students and one staff 
member from Applied Psychological Services were 
used as Ss in the second and third studies. Again, 
all had normal color vision and normal visual acuity. 


Procedure 


The stimuli were all the same size, in the first 
study, i.e., 6.5 millimeters in diameter, and they sub- 
tended 1 degree of visual angle when situated at the 
normal perimetric measurement distance. For each S$ 
measurements were made using all three fluorescents 
during one session and all three ordinary colors 
during another. With alternate Ss, the order of pres- 
entation was counterbalanced. Measurements with a 
white test stimulus were made in both series. This 
stimulus was used as a control to determine if 
sequence effects were present. Within each series, the 
order of presentation of colors was randomized so 
as to control for errors of anticipation. 

For each S, each of the seven test objects (in- 
cluding the white) was brought in from the periph- 
ery at each of the eight basic meridians (0°, 45°, 90°, 
135°, 180°, 225°, 270°, 315°) a total of five times. 
Thus, for each S an average score based on five 
measurements was obtained at each meridian for 
each stimulus. For the red-orange and blue fluo- 
rescents and for their ordinary counterparts, two 
points on each meridian were measured: (a) the 
point at which the stimulus was first seen (usually 
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as a white or gray object), called the outside limits 
measurement; and (b) the point at which the true 
color of the test object could be identified, called 
the inside limits measurement. For the yellow-orange 
fluorescent and its ordinary counterpart and for 
white, only the outside limits were obtained. Deter- 
mination of the inside limits was not made for the 
yellow-orange stimuli because of the difficulty in 
distinguishing them from the red-orange stimuli. This 
confusion arises because colored objects often take 
on a yellowish tint as they are moved in from the 
periphery. Thus, a red-orange stimulus placed farther 
out in the periphery can easily be confused with 
a yellow-orange stimulus at a point several inches 
closer to the central fixation point. 

Each S was adapted to the test illumination level 
prior to testing, and several trials with each color 
were given to acquaint Ss with the stimuli and the 
procedure. Rest pauses were given at prescribed 
intervals during the testing. Measurements in all 
cases were for the right eye; the left eye was covered 
with an eye patch. 

In the second study, the stimuli were all circular 
patches, 6.5 millimeters in diameter. Each was placed 
on a white 5 X7 inch card for insertion into the 
tachistoscope. At the eyepiece of the tachistoscope, 
the stimuli subtended .32 degree of visual arc. The 
same fluorescent and ordinary colors used in the first 
study were used again. 

One of the three channels of the tachistoscope 
provided the test field; another provided the pre- 
and postexposure field. The third channel was dis- 
connected and remained unused throughout. Thus, 
the visual sequence to the Ss was: preexposure field 
with fixation cross, stimulus exposure, postexposure 
field with fixation. cross. 

Measurements were made under two field lumi- 
nance levels. In the high luminance condition, the 
luminance of the test field was 16.0 footlamberts. 
Under the low luminance condition, the test field 
was 2.3 footlamberts. For both conditions the room 
brightness was 6.1 footlamberts. 

The test stimuli appeared 1 inch to the right, left, 
above, or below the fixation cross in the pre- 
exposure field. Two threshold measurements were 
obtained for each stimulus under each brightness 
condition. The first threshold represented the point 
at which the test stimulus could be identified as 
present in the visual field (the object threshold). 
The Ss reported when they first. perceived the 
circular stimulus (whether they could identify the 
color or not) and its location (left, right, up, or 
down from the fixation cross) for this threshold. 
The second threshold was that corresponding to the 
identification of the color of the stimulus (the color 
threshold). Three successive correct identifications 
were required for the specification of the thresholds. 

The Ss were given practice trials with each stimulus 
before beginning the experiment proper. The signal 
“ready” preceded each exposure of a stimulus by 
4 second. Ascending steps consisting of three stimulus 
exposures were employed. Each step was 0.1 milli- 
second longer than the preceding. Three successive 
correct reports of the stimulus position at the same 
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exposure interval were required before the threshold 
was judged attained. After the first threshold was 
obtained, exposure intervals were again successively 
increased until the color threshold was obtained. 

The order of presentation of the stimulus position 
on the card was determined randomly and the order 
of colors within each series was randomly varied, 
appearing only once in a given series. The complete 
experimental design involved nine series; thus, each 
threshold was based on nine trials. 

For purposes of comparison, a white stimulus 
was presented under the low luminance condition 
and a black stimulus was presented under the high 
luminance condition. The comparison stimuli were 
the same size as the experimental stimuli. Only the 
first seen threshold was obtained for these stimuli. 

In the third study, a group of 28 stimuli was used. 
These stimuli, drawn to scale, are shown in Figure 1. 
Stimuli A through G and J through N were pre- 
pared in duplicate; one set in ordinary yellow- 
orange, the other in fluorescent yellow-orange. 
Stimuli A, F, G, J, M, and N (fluorescent and 
ordinary) were included to test the effects variation 
of- shape (perimeter with area held constant) and 
also the effects of increasing area. 

Stimuli A, B, C, D, E, J, K, and L (fluorescent 
and ordinary) were involved in determining the ef- 
fects of separating chromatic areas when the total 
chromatic area is held constant. In all cases, the 
total chromatic area was held constant but the 
separation distances between chromatic areas were 
varied. 

Stimuli H, I, O, and P were included to test the 
effects of combining two chromatic elements. Stimu- 
lus H matched Stimulus B with the exception that 
B had a white stripe down the center and H had 
a blue stripe. Similarly, Stimulus O matched Stimu- 
lus K but O had a white stripe and K had a blue 
stripe. Stimuli I and P were blue and of the same 
area and shape as the stripes of Stimuli H and O. 

Following practice trials, five test trials with each 
stimulus were administered. The order of presenta- 
tion was varied randomly across Ss and trials. The 
signal “ready” preceded each exposure by 3 second. 
Exposure times were increased in 0.1 millisecond step 
intervals with three presentations at each exposure 
interval. Only the object threshold was obtained. 
Three.successive correct reports out of three attempts 
at any given time setting were required for a 
threshold determination. The S was required to 
report the correct position of each stimulus with 
reference to the fixation cross and not the shape 
or color of the stimulus. 


RESULTS FROM LABORATORY STUDIES 
Study 1 


In the first study, two sets of measurements 
were obtained. The first refers to the outside 
limits (that point on each meridian at which 
the stimulus was first seen), the second to 
the inside limits (that point on each meridian 
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Fic. 1. Stimuli employed in third study. 


at which the true color was perceived). 
Inside-limits measurements were made only 
for the fluorescent red-orange and fluorescent 
blue and their ordinary counterparts. 

The inside-limits data indicated that the 
fluorescent blue had the largest visual field. 
This was followed by ordinary blue, fluo- 
rescent red-orange, and ordinary red-orange, 
respectively. These data are in accord with 
what is generally accepted as representing 
the visual fields for red and blue (Boring, 
1942). They are also in accord with expecta- 
tions, insofar as the fluorescent colors yielded 
larger visual fields than the ordinary colors. 
The Wilcoxon matched-pairs signed-ranks test 
was applied to the data and revealed signifi- 
cant differences between all fields except those 


for fluorescent red-orange and ordinary red- 
orange and for fluorescent red-orange and 
ordinary blue. 

The outside-limits data revealed larger 
fields for the fluorescent colors than for ordi- 
nary colors. Application of the signed-ranks 
test yielded statistically significant differences 
at the .01 level of confidence between the 
fluorescent and the nonfluorescent visual 
fields. In no case did an ordinary color yield 
a larger field than its fluorescent counterpart. 
The two blue stimuli did not differ from 
each other significantly, although the other 
two (i.e., red and yellow) did. Also no signifi- 
cant differences were found within either of 
the two groups of colors. 

Analysis of the data obtained from the 
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white stimulus, which was placed in the 
experiment to test the presence of a sequence 
effect, revealed that there were no statistically 
significant differences between the white field 
generated in the first half of each run and 
that generated in the second half. The signed- 
ranks test was applied to these data. 


Study 2 


The second study, was performed to answer 
the question of whether fluorescent colors 
possess lower thresholds than ordinary colors 
and also which colors within each type possess 
the lowest thresholds. Mean threshold values 
were obtained for the object and color thresh- 
olds for each stimulus, under the two lumi- 
nance conditions (high and low). The rank 
order of these thresholds was the same under 
both luminance conditions. The object thresh- 
old rank order, in terms of the order in which 
the stimuli were perceived, was: ordinary 
red-orange, ordinary blue, ordinary yellow- 
orange, fluorescent blue, fluorescent red- 
orange, and fluorescent yellow-orange. The 
color threshold rank order was: fluorescent 
red-orange, fluorescent yellow-orange, ordi- 
nary yellow-orange, and ordinary red-orange. 

An analysis of variance applied to the data 
indicated statistically significant differences 
at or below the .01 level of confidence for 
colors and luminance levels for the object 
threshold data as well as the color threshold 
data. The interaction was significant at the 
.O1 level for the object thresholds but not 
for the color thresholds. 

Tukey’s procedure (gap test) for com- 
paring means was applied to the data (Ed- 
wards, 1957). The gap test for object thresh- 
olds for high and low luminance conditions 
indicated the same three groups: ordinary 
blue, ordinary red-orange, ordinary yellow- 
orange, and fluorescent blue in one group; 
fluorescent red-orange by itself; and fluo- 
rescent yellow-orange by itself. Application 
of Tukey’s test for “stragglers” indicated 
that each color was to be considered as sig- 
nificantly different from every other color. 

Only object thresholds were obtained on the 
comparison stimuli of white and black, since 
color thresholds cannot be obtained for these 
stimuli. The white stimulus, presented only 
under the low luminance condition, was in- 


ferior to the other colors, whereas the black 
stimulus, presented under the high luminance 
condition, was superior to the other colors 
tested. 


Study 3 


Shape. The first variable analyzed in the 
third study was the effect of shape, with the 
area of the stimuli held constant on the 
detection of the stimuli. The square stimuli 
possessed lower thresholds than the rectangu- 
lar stimuli. The results of two-way (Shape 
x Paint Type) analyses of variance indicated 
significant differences for the main effect of 
paint type below the .01 level. Although the 
square stimuli were not always significantly 
different from the rectangular stimuli, the 
thresholds were always lower. 

Area. Comparisons of fluorescent and ordi- 
nary stimulus types were made to determine 
the effects of changes in stimulus area. Two- 
way analyses of variance were also applied 
to these data. For both the rectangular and 
the squarelike stimuli, the main effects of 
both types of paint and area were statistically 
significant below the .01 level. Interaction 
effects were not significant in either case. 

These data are plotted separately for the 
squarelike and rectangular stimuli in Figure 2. 
The mean tachistoscopic threshold values are 
derived across Ss and paint type; area is 
expressed in minutes of visual angle. 

Within the brightness range employed, it 
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appears that increasing stimulus size up to 
20-25 minutes of visual angle resulted in 
significant decreases in perceptual speed; 
beyond this point leveling occurred. When 
rectangular and squarelike stimuli were 
equated for area, the squarelike stimuli pos- 
sessed smaller perimeter to area ratios. In 
the current work, these ratios were: Stimulus 
A, 0.67; Stimulus J, 1.11; Stimulus F, 1.00; 
and Stimulus M, 2.11. 

Of the area equated stimuli, those possess- 
ing lower perimeter to area ratios had lower 
thresholds. These findings are compatible with 
the Kohler-Wallach (1944) theoretical po- 
sition (assuming diffusion at the edges of a 
satiated area in the striate cortex) and with 
the findings of Bitterman, Kauskopf, and 
Hochberg (1954). 

Stimulus Separation. If larger areas yield 
lower tachistoscopic thresholds, the possibility 
exists that two chromatic areas separated by 
an intermediate stripe of the same color as 
the background (white) will be perceived as 
a simple unified stimulus. With this occur- 
rence, lower thresholds might be obtained 
than that expected from the sum of the two 
chromatic areas. The mean _ tachistoscopic 
thresholds (for fluorescent and ordinary pig- 
ments) for the five squarelike stimuli used 
revealed the following rank order: Stimulus A 
(zero separation), Stimulus B (1.5 millime- 
ters separation), Stimulus E (8.0 millimeters 
separation), Stimulus C (3.0 millimeters sepa- 
ration), and Stimulus D (6.0 millimeters 
separation). 

A two-way analysis of variance conducted 
on the data revealed that both main effects 
(paint type and separation distance) were 
statistically significant below the .01 level of 
confidence. No significant interaction effects 
were displayed. Application of Tukey’s gap 
test separated Stimulus A from the other 
stimuli. 

The mean tachistoscopic thresholds (for 
fluorescent and ordinary) for the three types 
of rectangular stimuli used resulted in the 
following rank order: Stimulus J (zero sepa- 
ration), Stimulus K (1.0 millimeter separa- 
tion), and Stimulus L (4.0 millimeters sepa- 
ration). The two-way analysis of variance 
indicated statistically significant differences 
among the main effects, below the .01 level, 


while the interaction effect was again non- 
significant. Tukey’s gap test separated Stimu- 
lus L from the other two stimulus types. 

The data for the rectangular stimuli are 
consistent with the data for the squarelike 
stimuli in indicating that the tachistoscopic 
threshold tended to be lowered when the 
stimulus was not divided into parts and the 
parts separated. 

The problem of why the separated stimuli 
did not yield faster perceptual speeds remains 
open. It is possible that ‘closure’ did not 
occur. Closure might have been predicted 
since in at least one of the previous studies 
(Bobbitt, 1942) the closure threshold was 
found to range between 67% and 72% of 
perimeter, a value within the limits of the 
separations here employed. On the other 
hand, it is known that closure is a function 
of the organizational dynamics of the figure. 
Two rectangles separated by the white stripe 
may be dynamically organized as “good 
figures” which possess symmetry, balance, 
simplicity, and uniformity. Thus, closure 
effects may have been prohibited. 

Assuming lack of closure, only one or the 
other chromatic area of the separated stimuli 
may have been seen since eye fixation on the 
cross in the preexposure field (located at the 
middle of the dividing stripe) may have im- 
posed some eye movement in order to find 
the separated stimuli. If true, the effective 
stimulus area was thus smaller than that of 
the closed stimulus. 

Further support is given to the previous 
contention that perimeter alone is not an 
influential variable, since the perimeter of 
the separated stimuli was greater in all cases 
than that of the solid stimulus. 

Combined Stimuli. The effects of combining 
blue with fluorescent yellow-orange on tachisto- 
scopic thresholds were here examined by com- 
paring B, G, H, and I of the squarelike 
stimuli and K, N, O, and P of the rec- 
tangular stimuli (Figure 1). The obtained 
mean object threshold rank order for the 
squarelike stimuli was fluorescent yellow- 
orange with a blue medial stripe, solid fluo- 
rescent yellow-orange, and fluorescent yellow- 
orange with a white medial stripe. The 
differences between the thresholds were 
compared with each other employing the 
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Wilcoxon signed-ranks test on the paired 
means. No significant differences were ob- 
tained between the solid fluorescent yellow- 
orange and the fluorescent yellow-orange with 
a blue medial stripe or between the solid blue 
stripe and the fluorescent yellow-orange with 
a blue medial stripe. But the difference 
between the fluorescent yellow-orange with 
a blue medial stripe and the fluorescent 
yellow-orange with a white medial stripe was 
significant (.01 level) with the fluorescent 
yellow-orange with a blue medial stripe yield- 
ing the lower object threshold. These results 
suggest that an influential factor in the 
threshold for the fluorescent yellow-orange 
with a blue medial stripe stimulus was due 
to the blue component, which may have 
increased the color contrast. 

In a manner identical to that above, the 
effects of combining blue with fluorescent 
yellow-orange were evaluated for the rec- 
tangular stimuli. The obtained mean object 
threshold rank order was: fluorescent yellow- 
orange with a blue medial stripe, solid fluo- 
rescent yellow-orange, and fluorescent yellow- 
orange with a white medial stripe. The dif- 
ference between the fluorescent yellow-orange 
with a blue medial stripe and the solid rec- 
tangular blue stimuli was statistically signifi- 
cant on the basis of a Wilcoxon signed-ranks 
test on the paired means. The fluorescent 
yellow-orange with a blue medial stripe 
stimulus was significantly more effective than 
the solid fluorescent yellow-orange stimulus 
(.02 level) and also significantly more ef- 
fective than the fluorescent yellow-orange 
with a white medial stripe stimulus (.01 
level). 

The hierarchical order of thresholds for the 
combined rectangular stimuli was identical 
with that obtained for the combined square- 
like stimuli. The consistency in the trends 
for these stimuli lends support to the conten- 
tion that combined stimuli elicit lower thresh- 
olds than the solid stimuli and that solids 
elicit lower thresholds than those separated 
by a white medial stripe. 


SUMMARY AND CONCLUSIONS FOR 
LABORATORY STUDIES 


From the visual perimetric point of view, 
the most useful aircraft exterior coloration in 


terms of midair collision avoidance is that 
coloration which affords color zones of the 
greatest magnitude. The results of the first 
study suggested that fluorescent paints pos- 
sess greater fields than their ordinary paint 
counterparts. Thus, by extrapolation, support 
is given to the contention favoring the use 
of fluorescent paint for the purposes of 
aircraft detectability and visibility. 

In the second study two thresholds were 
obtained, under each of two field luminance 
conditions, for fluorescent red-orange, fluo- 
rescent yellow-orange, fluorescent blue, and 
their ordinary counterparts. The object 
threshold was defined as the lowest tachisto- 
scopic exposure interval necessary for identifi- 
cation of the presence of a stimulus; the color 
threshold was defined as the lowest tachis- 
toscopic exposure interval necessary for the 
identification of the color of the stimulus. The 
results (with the exception of fluorescent 
blue) for both luminance levels indicated that 
the fluorescent paints had lower color thresh- 
olds than their ordinary paint counterparts 
and that the ordinary paints had lower ob- 
ject thresholds than their fluorescent paint 
counterparts. 

The following conclusions, based on the 
results of the third study, may be drawn: 

1. Stimuli of a given area were more ef- 
fective than those stimuli divided into halves 
and the halves separated by some distance. 
The greater the separation (disorganization) 
the less effective were the stimuli. 

2. Stimuli of a given area tended to be 
more effective the less rectangular and the 
more squarelike they were in shape. 

3. Simple increases in the border contrast 
of stimuli with their background (i.e., in- 
creasing stimulus perimeter) were not a ma- 
jor factor in increasing stimulus effectiveness. 

4. Increasing the area of stimuli, regardless 
of the shape, increased the effectiveness up 
to a point, after which further increases in 
area did not yield increases in effectiveness. 
The cutoff point, with respect to area, was 
between 20 to 25 minutes of visual angle. 

5. The effectiveness of a combined fluo- 
rescent yellow-orange with ordinary blue was 
greater than the effectiveness of oa com- 
ponent taken separately. 


100 


MeEtTHOD FOR FIELD STUDIES 


Field research, focusing on problems of detecta- 
bility, is difficult to accomplish because of problems 
involved in controlling such variables as background 
brightness and color, the contrast of the stimulus 
with its background, atmospheric attenuation result- 
ing from haze, fog, or dust, and slant angle of 
the sun. 

The detectability of fluorescent and ordinary 
paints against a sea background was investigated 
by the Medical Research Laboratory of the United 
States Naval Submarine Base (1955). The fluo- 
rescent paints involved showed greater detection dis- 
tances than did the ordinary paints. The result of 
this study also suggested that high brightness and 
high saturation contribute to detectability, that 
detectability correlates highly with total contrast, 
and that white possesses considerable detectability 
under certain conditions. 

Blackwell (1960) investigated the visibility of vari- 
ous chromatic and achromatic stimuli. His findings 
were in reasonable agreement with those of the Med- 
ical Research Laboratory. He found little difference 
between fluorescent and ordinary paints for overcast 
days and a sky background. However, with a 
clear sun, fluorescent paints possessed a marked 
superiority. Combining overcast and clear sun con- 
ditions, Blackwell’s data suggested the following 
rank order of detectability: fluorescent yellow- 
orange, white, fluorescent red-orange, international 
orange, gray, and black. 


Apparatus 


The experimental work was performed at Mustin 
Field of the Naval Air Material Center, Philadelphia. 
The stimulus presentation apparatus, for the first 
field study, was stationed on the roof of a three- 
story building. From the line of sight of S on the 
ground, an unimpeded sky background was afforded 
at all times. The Ss always viewed the stimuli 
from north to south. The stimuli were exposed, two 
at a time, one on either side of a tall support pole, 
by means of wires attached to two crossrods. 

The second experiment was also performed out-of- 
doors under natural illumination. A hexahedral 
structure containing six stimuli was attached to a 
metal support pole which was mounted on a tower 
23 feet high. When the stimuli were mounted on 
the structure, they filled the 1 o’clock, 3 o’clock, 5 
o’clock, 7 o’clock, 9 o’clock, and 11 o’clock posi- 
tions. The stimulus tower was placed at the far west 
end of a straight level road. The Ss always viewed 
the stimuli from east to west. 


Stimuli 


The choice of stimuli for these experiments was 
based on the findings of the laboratory studies as 
well as those of the Medical Research Laboratory 
(1955) and Blackwell (1960). The six samples 
employed were the same in both studies, with one 
exception. In the first study the samples were 
fluorescent yellow-orange, fluorescent red-orange, 
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fluorescent red-orange with a white medial stripe, 
fluorescent red-orange with a blue medial stripe, 
ordinary orange, and white. In the second study 
fluorescent red-orange with a blue medial stripe was 
replaced by white with a black medial stripe. The 
laboratory studies also suggested that squarelike 
stimuli possess a detectability advantage; con- 
sequently all stimuli were square shaped. 

In the first study, all monochromatic stimuli were 
1§ inches square; the mixed stimuli, composed of 
two fluorescent red-orange rectangles (11/16 X 18 
inches) were separated by a rectangle of the same 
size of either white or blue. 

In the second study, each solid stimulus was 5 
inches square. The mixed stimuli consisted of two 
outer rectangles separated by a medial rectangle 
(stripe). All rectangles were 5 inches long and 1.67 
inches wide. 


Subjects 


Two male college students and 1 male staff 
member of Applied Psychological Services were 
used as Ss in the first study. In the second study, all 
Ss were enlisted men in the United States Navy. 
They were screened on the basis of visual acuity 
and color vision integrity. None suffered from any 
form of color anomaly and all Ss had 20/20 vision 
or wore corrective lenses rendering their corrective 
vision at 20/20. A total of 13 Ss was used in the 
second study. 


Procedures 


Object and color thresholds were obtained for 
each stimulus. These are defined in the following 
manner: (a) object threshold—the maximum dis- 
tance at which the stimulus could be detected with 
certainty, (b) color threshold—the maximum dis- 
tance at which the colors could be identified. In all 
cases, the distance recorded was ground distance 
measured to the nearest foot. 

In the first study, each stimulus was presented to 
each S a.total of 10 times. The order of presenta- 
tion and the position of the stimuli (right or left 
of the support pole) were counterbalanced through- 
out. Since two stimuli were presented simultaneously, 
a total of 30 trials was required for each S. 

Each S was situated on the rear of an open bed 
pickup truck which was driven along the runway 
from a distance at which the stimuli were subliminal 
toward the stimuli until the object and color thresh- 
olds were obtained. Throughout all trials, the S$ 
wore clear plastic goggles to protect his eyes from 
the wind. 

The Ss were tested in the morning between 9:30 
and 11:45. The path of the sun during this interval 
was parallel to the path of S. Thus, the stimuli 
were positioned so that at no time was a shadow 
cast on them. Data were collected during the first 
2 weeks of January 1961. All Ss were tested under 
clear sky conditions. Sky background brightness 
measurements, as measured by the Luckiesh-Taylor 
brightness meter, varied from 800 footlamberts to 
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1,500 footlamberts. Visibility was from 4 to 15 miles 
and wind velocity was from 7 to 9 miles. 

In the second study, S was seated alongside the 
driver in an open jeep vehicle which was driven at 
very slow speeds. The trials started at a distance 
which rendered the stimuli subliminal. The S was 
instructed to report each stimulus and its clock posi- 
tion as soon as he detected it (object threshold) and 
then the colors as soon as he could identify them 
(color threshold), 

The stimuli were presented, en masse, to each S a 
total of 12 times under each of three meteorological 
conditions. The order of presentation and the posi- 
tion of the stimuli were counterbalanced so that each 
stimulus appeared in each of the clock positions once 
in each set of six trials and never appeared between 
the same two stimuli more than once in each set 
of six trials. 

The object and color thresholds were recorded by 
an experimenter who was seated in the back of the 
vehicle. Distances were measured by the Rolatape 
Measuring Wheel, Model 400, After the S identified 
the colors of all six stimuli, he was returned to a 
subthreshold distance and was then ready for the 
next trial, Each trial took approximately 8 to 10 
minutes, 
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The stimuli were viewed under three different con- 
ditions: sunny A.M., sunny P.M., and cloudy. The 
data were collected during the summer months of 
1961. In sunny A.m. condition the sky background 
was clear, blue, and without clouds or haze. In this 
condition, the path of the sun’s travel was parallel 
to and in the same direction as S was traveling. 
At the outset, the sun was about 35 degrees high and 
at the conclusion, it was about 75 degrees high. Thus, 
it was always in front of the stimuli and shining on 
them. 

Brightness of the sky background and the four 
monochromatic stimuli was measured on a Spectra 
spot brightness meter at the beginning of the first 
trial, the end of the sixth trial, and the beginning of 
the last trial. The mean brightness measurement for 
the sky background was 1,967 footlamberts. Visi- 
bility was from 4 to 8 miles with wind velocities 
from 5 to 13 miles. 

In the sunny p.m. condition, the position of the 
sun ranged from almost directly over the stimulus 
tower (90°) to behind the tower (125°). The mean 
brightness measurement of the sky background was 
4,342 footlamberts. The sky background was clear 
and blue and had a slight haze. Visibility was from 
2 to 12 miles with wind velocities of 6 to 14 miles. 
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Fic. 3, Object thresholds (feet) for various stimuli—(Study 4). 
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The cloudy condition had a sky background 
heavily laden with smoke and haze and gray cloud 
coverage. The sun was completely hidden in this 
condition. Visibility was from # to 10 miles with 
wind velocities from 5 to 7 miles. The data col- 
lected under this condition were taken between 
9:00 AM. and 11:30 am. The mean brightness of 
the sky background was 1,600 footlamberts. 


RESULTS FROM FIELD STUDIES 
Object Thresholds 


Figure 3 presents S’s mean object thresholds 
in the first field study. It is apparent that 
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Fluorescent 
yellow-orange 


Fluorescent yellow- 
orange 

Fluorescent red- 
orange 

White 

Fluorescent red- 
orange with a white 
medial stripe 
White with a black 
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Fluorescent red- 
orange 


White 


Fluorescent red- 
orange with a white 
medial stripe 


White with a black 
medial stripe 


Ordinary orange Ordinary orange 


The results of an analysis of the variance 
of the detection distance data suggested that 
statistically significant differences existed 
among the six stimuli and the three meteor- 
ological conditions. The stimulus by condition 
interaction was also significant. Tukey’s gap 
test divided the six stimulus means into 
three significant (.05 level) groups. These 
groups were: (a) fluorescent yellow-orange 
and fluorescent red-orange, (b) white and 
fluorescent red-orange with a white medial 
stripe, and (c) white with a black medial 
stripe and ordinary orange. The means of the 
three conditions were divided into two sig- 
nificantly (.05 level) different groups by the 
gap test: (@) sunny a.M. and (6) cloudy and 
sunny P.M. 

The analyses performed on the data from 
the second study suggest the relative superior 
detectability of fluorescent yellow-orange and 
fluorescent red-orange and the relative in- 
effectiveness of white with a black medial 
stripe and ordinary orange. Detectability in 
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good between S agreement on the hierarchical 
order of the object thresholds of the various 
stimuli was obtained. This agreement is also 
reflected in the high intersubject product- 
moment correlation coefficients (.83 to .97). 
It is also apparent from this plot that ordinary 
orange was relatively ineffective whereas 
fluorescent yellow-orange, fluorescent red- 
orange with a white stripe, and white were 
relatively superior. 

For the second field study the mean object 
threshold rank-order, across Ss for each con- 
dition was: 


Sunny P.M. Cloudy 


Fluorescent yellow- Fluorescent yellow- 


orange orange 
Fluorescent red- Fluorescent red- 
orange orange 
Ordinary orange White 


Fluorescent red- 
orange with a white 
medial stripe 
Ordinary orange 
White with a black 
medial stripe 


Fluorescent red- 
orange with a white 
medial stripe 

White with a black 
medial stripe 
White 





the sunny A.M. condition was superior to 
detectability in the sunny p.m. and cloudy 
conditions; detectability in the latter two 
conditions did not differ with statistical 
significance. 

Coefficients of concordance were obtained 
between the object thresholds of Ss tested 
under the three meteorological conditions. All 
coefficients were significant at the .01 level 
of confidence. The resulting coefficients in- 
dicate a high degree of relationship among 
Ss, within meteorological conditions. The 
coefficients are: sunny A.M.—.86, sunny P.M. 
—.87, cloudy—.91. 


Color Thresholds 


Figure 4 presents each S’s mean color 
threshold as obtained in the first field study. 
Again, considerable hierarchical between S 
agreement is apparent (no color threshold 
could be obtained for white). The product- 
moment correlations between the color thresh- 
olds of the stimuli for the three subjects 
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Fic. 4. Color thresholds (feet) for various stimuli—(Study 4). 
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were also very high (.92 to .99). The product- 
moment coefficients of correlation between the 
object and color thresholds of each S were: 
.98, .98, and .99, 

The mean color threshold rank-order, 
across Ss for each condition, and the mean 
across conditions in the second study was the 
same in each case. They were as follows: 
fluorescent yellow-orange, fluorescent red- 
orange, fluorescent red-orange with a white 
medial stripe, and ordinary orange. 

An analysis of variance of the individual 
subject color threshold data revealed statis- 
tically significant differences among the four 
stimuli (the white and the white with black 
medial stripe stimuli were not considered in 
this portion of the study) and the meteorolog- 
ical conditions, as with the object threshold 
data. The stimulus by condition interaction 
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was also significant. Tukey’s gap test divided 
the four stimulus means into three significant 
(.05 level) groups. They are: (a) fluorescent 
yellow-orange and fluorescent red-orange, (0) 
fluorescent red-orange with a white medial 
stripe, (c) ordinary orange. The means of the 
three conditions were divided into three 
significantly (.05 level) different groups by 
the gap test: (a) sunny A.M., (6) cloudy, 
(c) sunny P.M. 

Coefficients of concordance were obtained 
between the color thresholds of Ss under 
each of the three conditions. Again, the 
coefficients indicated a high degree of relation- 
ship among Ss. The coefficients were: sunny 
A.M.—.90, sunny P.M.—.92, and cloudy—1.00. 

In both field studies the fluorescent colors 
repeatedly displayed their relative superiority. 
Fluorescent yellow-orange and fluorescent red- 
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orange, although not differing significantly 
from each other, were consistently and sig- 
nificantly more easily discernible than the 
other stimuli involved. 

Concordance between the results of the two 
field studies is apparent except for the relative 
merit of the white and the fluorescent red- 
orange with white medial stripe stimuli. The 
reason for this discrepancy is not apparent. 
However, it is believed that the clear grayish 
winter sky background of the first study 
served to enhance the contrast of the white 
stimulus (and the white component of the red 
and white stimulus) in comparison with the 
brighter (less contrast) summer sky back- 
ground condition of the second field study. 
It should be noted that in the sunny P.M. 
condition, the minimum contrast condition, 
the white assumed the lowest possible de- 
tectability ranking. 


OptimuM PAINT SCHEME QUESTION 


First, it seems that few, if any, of the 
findings have indicated fluorescent paint to be 
inferior, from the conspicuity point of view, 
to other colors tested and to the achromatic 
stimuli. Under certain conditions, achromatic 
stimuli (white and black) can be expected to 
yield greater distance detectability than the 
fluorescent paints. However, under a number 
of conditions the detectability of fluorescent 
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paint was almost equivalent to and in some 
conditions its detectability was superior to 
the achromatic stimuli. Generally, the results 
suggest that little, if any, loss in detectability 
can be expected from the use of fluorescent 
pigments and considerable gain in conspicuity 
can be anticipated. 

The results also suggested that the utility 
of fluorescent paint may increase with its 
area of coverage. Moreover, it was indicated 
that the fluorescent area should be unbroken 
and preferably squarelike. Thus, large, un- 
broken fluorescent areas can be expected to 
yield maximum effectiveness. 

Moreover, the tachistoscopic study in- 
dicated that the addition of blue to a fluores- 
cent stimulus lowered the recognition thresh- 
old of the stimulus and blue possesses the 
largest visual field. White or navy blue seem 
to be preferable for use in combination with 
the fluorescent red-orange. Increased detect- 
ability can also be expected to emerge from 
the combination due to the increased internal 
and external contrast afforded by the second 
color. Moreover, a navy blue with a low 
reflectance would provide high internal bright- 
ness contrast with the fluorescent pigment. 
Additionally, since blue and fluorescent red- 
orange are at opposite ends of the reflectance 
continuum maximum external contrast would 
then be afforded against any background. 





Fic. 5. Possible high visibility paint scheme. 
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Moreover, these colors are at different ends 
of the color spectrum and should provide 
good color contrast. 

One schema, based on the above recom- 
mendations, is shown in Figure 5. Employ- 
ment of this or any paint scheme would of 
course be weighed in the practical situation 
against factors such as the inability to employ 
certain paints on heat areas and aircraft con- 
trol areas. 
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“MASCULINE STRIVING” AS A CLUE TO SKILLED- 
TRADE INTERESTS 


JOHN N. McCALL 
State University of New York, Buffalo 


A new verification scale for the Minnesota Vocational Interest Inventory 
(MVII) was hypothesized to serve as a Masculinity-Femininity scale when 
Ss answer honestly. Verification scores were compared for 117 Ss who 
took the MVII under normal instructions and under instructions to affect 
a masculine- or a feminine-oriented boy of 16. Most of the Ss were arts and 
science college freshmen, 4 were females, and 11 were still in high school. 
With the normal set, males scored higher than skilled tradesmen, below the 
mean for chance derived scores, and far below the females. Both males and 
females with a masculine set scored low, the same as tradesmen. Males with 
a feminine set scored the same as females under a normal set, but females with 
a feminine set averaged the same as chance. 


Concern with the problem of “response 
set” has led to improvements in self-report 
inventories and their use. One approach is to 
identify undesirable sets, such as “faking,” 
with special verification or validity scales. 
Special clinical scales are also constructed 
which might identify useful, although initially 
irrelevant, response tendencies. A good ex- 
ample is the Masculinity-Femininity (MF) 
scale on the SVIB (Strong, 1943). Rather 
than measure similarity of interests to specific 
occupational groups, as do most of the SVIB 
scales, this MF scale determines the tendency 
to answer either like most males or most fe- 
males in a normative sample. Such scores have 
value in personality assessment as well as in 
differentiating broad educational and occupa- 
tional groups (Darley & Hagenah, 1955). 

The present study shows that one recently 
published verification scale also functions as 
a MF scale. Depending upon conditions at 
the time of testing, the scale may identify 
one of several undesirable response sets or 
it may measure a useful personality-interest 
trait. We refer to Campbell’s 58-item verifi- 
cation scale (Campbell & Trockman, 1963) 
for the MVII. The latter was devised to 
measure similarity of interest to adults in 20 
different skilled-trade and related occupations. 
Campbell based his scale on items which 
were rarely endorsed by tradesmen in the 
initial standardization group. Thus, high 
scores indicate a simple deviation from the 
responses expected for skilled tradesmen. 
Clinical interpretations of this scale are 


hampered because high scores can occur for 
quite different reasons, including: failure to 
comprehend directions, carelessness, anxiety, 
and so on. Campbell found deviate scores for 
answer sheets marked by a sample of hos- 
pitalized psychotics, by normal subjects (Ss) 
who did not read items in the booklet, and 
when marked with the aid of random number 
tables. The psychotics’ mean score of 11 
stood just half way between the adult trades- 
men’s mean of 4 and the randomly determined 
mean of 18. 

In a recent study with vocational high 
school boys (Barnette & McCall, 1963), 
several students received high scores on 
Campbell’s (1963) verification scale. Yet, it 
was believed they had answered seriously 
and with good comprehension of their task. 
And none. of them appeared psychotic. The 
fact that some of these boys were enrolled in 
a Foods course suggested they might differ 
from the majority of skilled-trades workers, 
who deal more with technical and outdoor 
interests. A look at the MVII items produced 
another clue.t These items, in triad form, 
require Ss to choose one activity as most 
liked and another as least liked. It became 
apparent that many of them included mascu- 
line and feminine activities in opposition to 
each other. For example, item number 152 
offers these choices: (a) play baseball, (b) 
see an educational movie, and (c) visit some- 
one in the hospital. Clearly, choices @ and 


1 Credit must be given to W. L. Barnette, Jr. for 
noting this feature. 
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MASCULINE STRIVING AND INTERESTS 


c represent a masculine-feminine contrast; 
and many of Campbell’s 58 verification scale 
items have this characteristic. It was also 
found that a few items from Campbell’s scale 
are contained in Clark’s occupational scoring 
keys for Baker, Printer, Stock Clerks, Retail 
Sales Clerk, Food Service Manager, and 
Hospital Attendant. These scales represent 
the more feminine or indoor occupational 
groups among the total set of 20. None of 
Campbell’s items appear on the other scoring 
keys. 

To test the hypothesis that Campbell’s 
scale can serve as a MF scale, scores were 
compared for Ss who were chosen to represent 
different degrees of “masculinity” or ‘“femi- 
ninity” of interest. These were male and female 
students in high school and college. Since 
none of them was oriented toward skilled- 
trade vocations, it was expected that even 
the males would show a higher average verifi- 
cation score than Campbell’s tradesmen’s 
average of 4. Of course, females were ex- 
pected to score higher than the average of 
18 obtained with randomly marked answer 
sheets. Assuming that a masculine or femi- 
nine orientation could be consciously adopted, 
both male and female Ss were asked to affect 
these response sets. It was predicted that 
changes in verification scores would cor- 
respond with instructions, regardless of the 
sex of Ss. 


METHOD 
Subjects 


The subject pool totaled 78 males and 39 females, 
age 14 to 28. None was a volunteer and all were 
enrolled in summer school courses where classroom 
time was given for the study. Only six boys and 
five girls were still in high school (taking a reading 
skill course); the majority were college freshmen 
talking introductory psychology. Most of the college 
students were arts and science majors and a few 
were business or engineering majors. 


Procedure 


The MVII was first given to all Ss in their own 
classrooms and with normal instructions to report 
their true interests. After 1 week they all took the 
MVII again, but under special instructions, Some 
were asked to affect a masculine set. Others were 
asked to affect a feminine set, although from the 
viewpoint of a boy who sees things as a girl might. 
All Ss in any one classroom were given the same 
instructions. 
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These special instructions avoid any reference to 
“masculine,” “feminine,” “sissy,” and so on. To 
illustrate: 


Today you will take the interest inventory 
again. But this time, instead of giving your own 
interests, we want you to answer as you think the 
following type of person would: 


Examiner then gives either the masculine or femi- 
nine set. 


Masculine set 


He is a 16-year-old boy who is eager to finish 
school and get out on a job, earning his own 
money. He much prefers active sports and working 
with machines or tools to doing school work, 
reading, or listening to music. 

He differs with his teachers about the need for 
homework and he often argues with his parents 
about how he should dress, spend money, and 
use his time. In short, he is impatient to be 
grown up and out on his own. 


Feminine set 


He is a 16-year-old boy who is well liked by 
his parents and teachers. He likes most of his 
school subjects and he prefers school work or 
reading to outdoor things, including rough sports. 

He believes it is important to get along with 
other people and he avoids breaking the rules 
laid down at home and at school. Sometimes he 
thinks about going to college and learning to help 
other people in some way. 


The two answer sheets for each S were next 
scored with Campbell’s (1963) verification scale 
Average scores were compared for the males and 
females under the normal and experimental con- 
ditions, for high school and college Ss of the same 
sex, and for the same Ss under different test instruc- 
tions, All the hypotheses tested concern group dif- 
ferences in mean verification scores: 

1. Under normal test instructions: (a) College 
oriented males will score higher than the tradesmen’s 
average of 4. (b) High school and college Ss of the 
same sex will show similar scores. (c) Males will 
score lower and females higher than the average 
chance score of 18. 

2. Under instructions to affect the interests of a 
16-year-old boy with a masculine response set, males 
and females will score near the tradesmen’s average 
of 4, 

3. Under instructions to affect the interests of a 
16-year-old boy with a feminine response set, both 
males and females will score higher than chance 
scores but lower than females do under normal 
instructions. 


RESULTS 


Table 1 shows the average verification 
score for each sex group under different ex- 
perimental conditions. Included are data on 
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TABLE 1 
MEANS, STANDARD DEVIATIONS, AND ¢ TEST RESULTS FOR MALES AND FEMALES ON 
CAMPBELL’S VERIFICATION SCALE, UsiING DIFFERENT INSTRUCTIONS 
Instruc- t test 
Number Sample tional set n x s Contrasts result 
1 Total female Normal 39 25.38 6.67 1 vs. 2 1S 3 ae 
2 Total male Normal 78 Te, 6.48 1 vs. 3 4.39** 
3 Random — 24 18.67 4.19 2 vs. 3 S103h 
4 (a) Female Masculine 25 3.40 3.59 4 (a) vs. 4 (b) 2.49% 
(b) (same) Normal 25 25.80 6.81 4 (a) vs. 5 (a) 1.91 
5) (ey) Male Masculine 42 2.74 2.30 4 (a) vs. 3 ZO nam 
(b) (same) Normal 42 9.86 5.69 5 (a) vs. 5 (b) 0.19 
5 (a) vs. 3 41a 0am 
6 (a) Female Feminine 12 18.75 7.20 6 (a) vs. 6 (b) 0.56 
(b) (same) Normal 12 25.58 6.50 6 (a) vs. 7 (a) 6.46*** 
ie) Male Feminine 23 25.87 6.16 6 (a) vs. 3 0.89 
(b) (same) Normal 23 13.39 7.58 7 (a) vs. 7 (b) 308 ae 
7 (a) vs. 3 9.42*** 
* > <.05. 
eK < .O1. 
HK < O01. 


the sample sizes, standard deviations, and 
t test results. The overall results are striking 
in spite of the small samples involved. Not 
shown are the separate results for high school 
and college Ss of the same sex; their scores 
were so similar that the results were combined 
to form somewhat larger male and female 
samples. 

Hypotheses 1 and 2 are well supported by 
the data. The college-oriented, total male 
sample shows a mean of 11 under normal 
instructions and this is distinctly higher than 
the tradesmen’s mean of 4 (Campbell & 
Trockman, 1963). The total female sample, 
under normal instructions, shows a mean of 
25, almost as far above the chance mean of 
19 as the males are below it. The new random 
sample mean of 18.67 cross validates Camp- 
bell’s result, using the same randomizing pro- 
cedure. 

Results for males and females under the 
masculine set instructions are essentially the 
same. The mean of 3 differs little from the 
tradesmen score of 4. Note how the standard 
deviation shrinks to about 3 for both sexes. 
Under feminine set instructions the results 
are somewhat unexpected. Males average 
about 25, the same as females do under 
normal instructions; and the females average 
only 18, the same as produced with random 


marks. In short, Hypothesis 3 is not sup- 
ported. 


DISCUSSION 


The results strongly support the contention 
that Campbell’s verification scale for the 
MVII serves as a MF scale. This interpreta- 
tion presupposes that one can rule out the 
possibility of deliberate faking or some other 
failure to report one’s actual interests. Such 
a finding is not surprising since the scale 
was based on the responses of a strongly 
masculine population, skilled tradesmen. We 
might expect different individuals to resemble 
more feminine persons in occupational in- 
terests. 

Involved here is some construct validation 
for the clinical concept of ‘‘masculinity- 
femininity.” Rather than simply show that 
males and females score differently under 
usual instructions, requests to affect a mascu- 
line or feminine response set were effective. 
The specific wording of these instructions 
stemmed from an implicit theory concerning 
which interests would distinguish between a 
masculine and feminine orientation. 

The mere “chance” result for females 
under the feminine set instructions might not 
have occurred had they been imitating typical 
females rather than feminine young men. 


MASCULINE STRIVING AND INTERESTS 


Just why female Ss had more difficulty with 
the feminine set instructions than male Ss is 
not clear. Perhaps the feminine stereotype is 
less clear-cut, but the problem may have 
some relation to the ambiguity of double 
negatives. It may be more difficult to imitate 
deviates from the opposite sex than deviates 
from one’s own sex. 
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PREDICTION OF MEDICAL SCHOOL ATTENDANCE 
FROM COLLEGE FRESHMAN SVIBS 


ALBERT B. HOOD 


Student Counseling Bureau, University of Minnesota 


This study compared the Strong Vocational Interest Blank (SVIB) profiles of 
2 groups of college freshmen, 1 group which eventually applied to medical 
school and another which did not, but all of whom scored an A on the 
Physician scale. Significant differences were found between the 2 groups on 8 
occupational scales, the largest of which were on the Artist and Architecture 
scales which were highly negatively related to subsequent application to 


medical school. 


Among students who obtain high scores 
on a particular occupational scale on the 
Strong Vocational Interest Blank (SVIB), 
some will eventually enter that occupation, 
but many will not. This investigation studied 
the Strong profiles of a sample of male 
college freshmen who obtained a high score 
on the Physician scale to determine differences 
between those who subsequently did and did 
not apply to a medical school. 


METHOD 


At the University of Minnesota, freshmen who 
enter the College of Science, Literature, and Arts 
complete the SVIB during the regular orientation 
testing program. The Strong profiles were examined 
for all freshmen of 1953, 1954, 1955, and those 
with an A (standard score above 45) on the 
Physician’s scale were selected for this study. Of 
the 3,500 Strong profiles available, 366 or 10.6% 
had an A on the Physician’s scale. 

Students who wish to enter American medical 
schools are required to take the Medical College 
Admission Test. Lists of students who take this 
test, together with their test results are circulated 
to the medical schools twice each year. The lists 
for the years 1955 through 1960 were searched for 
the names of the University of Minnesota students 
who had scored an A on the Physician’s scale of 
the Strong as entering freshmen. It was thus pos- 
sible to determine which of these students were 
planning to enter the field of medicine, at least to 
the extent that they took the test for admission to 
some medical school. Of the 366 students who 
obtained an A on the Physician’s scale, 54 could be 
definitely identified who took the Medical College 
Admission Test. (Several others apparently ap- 
plied to a medical school, but they had very common 
names, such as John Anderson or Robert Johnson 
and absolute identification was impossible in a 
Minnesota population.) 

For this study, the remaining 47 scales of Strong 
profiles of a random sample of 50 of those who 


The results give further evidence of a lack of a common 
interest factor in Group I of the SVIB. 


subsequently applied to medical school were com- 
pared with a similar sample of 50 drawn from 
those who did not. 


RESULTS 


Male freshmen obtaining an A on the 
Strong Physician scale who did not sub- 
sequently apply to medical school obtained 
significantly higher (.05 level) scores on seven 
of the other occupational scales—Artist, 
Architect, Mathematician, Physicist, Math- 
Physical Science Teacher, Minister, and Ad- 
vertising Man. This group also had signifi- 
cantly more As on other occupational scales 
(6.1) than did those who applied to medical 
school (4.7). Significantly higher scores were 
found on the Osteopath scale among the 
sample which did apply. No significant dif- 
ference between the two groups was found 
on the Physician scale (all of which were 
above standard score 45). The scales showing 
the largest differences between the two groups 
were the Artist and Architect scales. Students 
with low scores on these two scales were likely 
to apply to medical school, those with high 
scores were not. Biserial correlations of —.64 
for the Artist scale and —.57 for the Architect 
scale were found between scores on these 
scales and subsequent application to medical 
school. 


DISCUSSION 


It was not surprising to discover that stud- 
ents who did not eventually apply to medical 
school tended to obtain higher scores on other 
scales since they had interests similar to 
persons in other occupations in addition to 
interests similar to those of physicians. It 
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was not expected that the two scales showing 
the largest differences, Artist and Architect, 
would be located in Group I along with the 
Physician scale. Among the occupational 
scales, Strong (1943) reports that only the 
Dentist scale has a higher correlation with 
the Physician scale than do the Artist and 
Architect scales which show correlations with 
the Physician scale of +.79 and +.78, re- 
spectively. These data support the results 
reported by Smith (1962) indicating the 
lack of a common interest factor in Group I 
and suggest the need to revise the makeup of 
this group. 

High school senior American Council on 
Education Test scores were available on 85 of 
the 100 students in this sample and high 
school percentile ranks were available on 75 
of them. The biserial correlation between ap- 
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plication to medical school and_ scholastic 
aptitude test score was .22 while it was .27 for 
high school percentile rank. For these male 
college freshmen who obtained an A on the 
Physician scale then, there was a significantly 
higher relationship between subsequent ap- 
plication to medical school and scores on the 
Artist and Architect scales than there was 
for scholastic aptitude or level of high school 
achievement. 
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EXPERIMENTAL EVALUATION OF BINARY PURE-TONE 
AUDITORY DISEUAY Ss” 


Ss. A. MUDD 2 


Purdue University 


The frequency, intensity, duration, and interaural difference (direction) dimen- 
sions of pure tone were evaluated singly and in combination at 3 comparable 
levels of discriminability in order to determine their relative effectiveness as 
binary cuing stimuli for an instrument monitoring task. The use of such 
signals decreased search time and reduced (Ss’) tendencies to be differentially 
attentive to the various sectors of the information display. No further reduction 
in search time occurred with 3- and 4-dimensional displays than with 2-dimen- 


sional displays. 


Frequency proved to be the most effective dimension for 


purposes of cuing. Intensity was least effective. Direction and Duration were 


of moderate effectiveness. 


The purpose of the following study was to 
extend the generality of the exploratory work 
reported by Mudd and McCormick (1960). 
That study indicated that auditory cues could 
be used effectively in conjunction with a 
visual search task (instrument monitoring). 
Search time required to locate a deviant dial 
on a simulated instrument panel was found 
to be reduced and habitual visual search 
patterns “broken-up” when operators were 
provided with binary auditory cues to guide 
their search. In general, the study showed 
that auditory displays were a_ practically 
useful accessory in visual monitoring tasks. 
Since the early work was of an exploratory 
nature, the results of the experiment lacked 
generality on several points: (a) Signal di- 
mensions were not tested in all available 
cuing combinations. (0) The spatial reference 
coding of the cuing dimensions was done on 
the basis of indirect evidence from the re- 
search literature. (c) The magnitude of 
stimulus difference between the two coded 
values within each dimension was conserv- 
ative; the points selected from the stimulus 
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requirements for the PhD degree, June 1961, Purdue 
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range were at more than sufficient distances 
apart to insure easy discrimination. (d) 
Stimuli used in cuing were in no way equated 
in order that comparisons of cuing efficiency 
could be made among the qualitatively dif- 
ferent dimensions. (e€) On the basis of the 
experimental procedure employed it was not 
clear whether or not persons assigned to the 
three-dimensional condition (three tonal di- 
mensions in the display) had learning experi- _ 
ence comparable with subjects (Ss) in the 
other less complex conditions of cuing (one 
or two tonal dimensions). 

The following experiment was conducted 
specifically to strengthen the above points in 
the course of evaluating the utility of the 
auditory dimensions of frequency, intensity, 
duration, and direction (interaural intensity 
difference) when coded and used as cues to 
assist in directing visual attention to specified 
areas of an information display. 


METHOD 


The study involved the experimental use of 15 
different sets of auditory signals as cues in directing 
visual attention to the specified visual displays. 
These 15 sets consisted of various combinations of 
the four sound dimensions, each of which had been 
scaled for equal discriminability within their “work- 
ing range” (Mudd, 1963). Each of the four dimen- 
sions had also been tested for meaningful spatial 
associations (i.e., extent to which each of the four 
sound dimensions was associated with the vertical 
and the horizontal dimensions of space). When a 
given auditory dimension was used (by itself or in 
combination with other dimensions) as a cue in the 
visual search task, the various areas of the total 
visual display were coded so as to take advantage 
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of these stereotyped associations. Thus, frequency 
and intensity were assigned to mean the vertical 
dimension, and duration and direction were assigned 
to mean the horizontal dimension of the visual in- 
strument display. The various combinations of the 
four tone qualities were evaluated in terms of 
- operator performance in the visual monitoring task. 


Independent Variables 


Two experimental variables and one control vari- 
able were incorporated into the experimental situa- 
tion. 

Type of Display. The primary independent vari- 
able investigated was the type of auditory display, 
in particular, a combination of one or more dimen- 
sions of sound (frequency, intensity, duration, direc- 
tion) encoded to serve as spatial referents, or “cues,” 
as to the location of a particular area (containing 
a deviant dial) on a simulated instrument panel. 
For the design of all experimental cuing conditions 
the following format was used: 

1. Each cuing dimension in any one display was 
represented by just two stimulus values (ie., the 
audio displays were binary). 

2. Interstimulus interval (in units of equal dis- 
criminability) was constant across cuing dimensions. 

3. Cue meaning was invariant across conditions, 
i.e., the sound dimensions of frequency and intensity 
always cued to the vertical, and duration and 
direction always were cues to the horizontal dimen- 
sion of space. 

4. Cuing stimuli were presented 
with a 1:1 on/off ratio. 

5. Cuing stimuli kept signaling until Ss made the 
correct response. 

Since some display combinations involved the 
assignment of two sound dimensions to just one 
spatial dimension, it was necessary to establish a 
cue assignment order. The order of assignment is 
outlined below. 

1. Frequency: always assigned to mean upper 
(high tone) and lower (low tone) “halves” of the 
panel. 

2. Intensity: (a) when used alone, always meant 
upper and lower halves (loud and soft tones, re- 
spectively); (6) when used in conjunction with 
frequency, intensity always meant the upper or 
lower part of the upper or lower “half” of the 
panel that was cued by the frequency dimension. 

3. Duration: always assigned to mean left (short 
tone) or right (long tone) halves of the panel. 

4. Direction: (a@) when used alone, always meant 
left and right halves (tone louder in left or right 
ear, respectively); (b) when used in conjunction 
with duration, direction always meant the left or 
right part of the left or right half that was cued 
by the duration dimension. 

Operators in the simulated instrument monitoring 
task were asked to use the binary sound cues to 
make one or more (depending upon the experi- 
mental display condition) dichotomies of the in- 
strument panel “space” and thus reduce the area of 
search for a deviant dial on the panel. For example, 


repetitiously 


113 


in a one-dimension display (frequency) the high- 
tone cued operators to search the upper half of the 
panel for the deviant dial, and the low frequency 
tone cued attention to the lower half of the panel. 
Utilization of such cues reduced the spatial search 
area by one-half. If provided with a two-dimen- 
sioned display, for example duration and intensity, 
the operator could use the spatial meaning of the 
auditory signals to narrow the area of search to one- 
fourth of the total panel area. In that condition of 
cuing a low intensity tone that was louder in the 
right ear would direct the attention of S to the 
lower right quadrant of the panel. Three-dimen- 
sional displays would permit the reduction of the 
visual search area to one-eighth, and all four dimen- 
sions combined into one display would enable Ss 
to focus visual attention on a search area consisting 
of one-sixteenth of the total panel area. Figure 1 
shows diagrammatically the progressive reduction of 
visual search area as a function of display dimen- 
sionality. The cuing code is given in terms of popular 
description, The upper half of Table 1 shows the 
various combinations of dimensions making up the 
15 experimental displays tested. 

Interstimulus ‘Distance. The second experimental 
variable—the interstimulus distance (in units of 
equal discriminability) between cuing values within 
sound dimensions—involved three levels of dis- 
criminability tested under each condition of display. 
The magnitude of the interstimulus distances for 
each level of discriminability was as follows: D1, 
3.70 ED; D2, 2.22 ED; D3, .74 ED. The lower half 
of Table 1 summarizes the stimulus values used for 
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Fic. 1. Diagrammatic representation of reduction 
of search areas as a function of number of cuing 
dimensions, 
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TABLE 1 


ConpiTIioNns oF AuprrorRy DisrpLay AND StimuLUS VALUES ror Basic Cuine 
DIMENSIONS AT LevELS oF DiSCRIMINATION 
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Condition Dimension Condition 

Cl c C9 

G2 kr C10 
C3 D Ci 
C4 I @12 
Gb Dur Cis 
CO FX Dur C14 
Gi Dur X D Crs 
C8 C16 


Dur X I 


Cuing dimension 


I'requency (cps) 


high 
low 


Intensity (decibels, SPL) 


loud 
soft 


Duration (milliseconds) 
long 
short 
Direction (decibels, re left ear) 


right 
left 


Note.—C = Control, F = Frequency, D = Direction, I= Intensity, Dur = Duration. 


the four cuing dimensions at each of the levels of 
discriminability. The difference interval between 
levels of discriminability was constant at 1.48 ED, 
The selection of the above values to represent dis- 
criminability levels was determined by three con- 
siderations. In each case the limiting factor was 
the nature of the direction dimension or the in- 
flexibility of the direction controlling apparatus. 

1. The range of discriminability tested should be 
fairly representative of the working ranges for each 
dimension. The direction range yielded in scaling for 
equal discriminability was just 3.35 ED units. That 
range was extrapolated upward to 3.70 ED for 
purposes of this study. 

2. The values selected should provide for marginal 

discriminability so that a base line could be estah- 
lished against which to compare levels of increased 
interstimulus range. The direction generating equip- 
ment permitted an ED interval of .74 at its lower 
limit, . 
3. The equal-interval steps within the working 
ranges were determined by the limitations of the 
apparatus used to vary the direction dimension, 
All other dimensions were continuously variable. 
The values selected accommodated the discrete-step 
outputs of the direction apparatus. 


Dimension 
ca) 
I 180) 
ir xX ] 
Diener 
Dire cD) 
De at al 
i & Dur X I 
ir x Dur al >< D) 


Stimulus value 


D1 D2 


5D 

8,615 3,948 2,143 
344 631 1,169 
82 75 68 
46 54. 61 
oo) 1,262 1,030 
559 O86 841 
—10 —6 —2 
+-10 --6 +-2 


Panel Sectors, The third independent variable, a 
control variable, was developed by classifying the 
spatial area of the simulated instrument panel used 
in the experimental task. The total panel area was 
subdivided into eight sectors each of which con- 
tained a 4X4 matrix of dials. The sectors thus 
classified were laid out in two rows of four sectors 
each, Reading from left to right on the instrument 
panel the top row of panel sectors ran from Sector 
1 (S1) to S4. The bottom row of sectors began 
with S5 (just below $1) and ran across the bottom 
half of the panel to S8 on the lower right part of 
the panel. 


Apparatus 


The experimental apparatus was set up in an 
acoustically treated room (9 feet ¥ 24 feet). A simu- 
lated instrument monitoring position was established 
at a sound desk. The instrument display consisted of 
colorslide projections (Minolta Optiper MX; Koda- 
chrome Professional, Ka 135-36 at Yr sec.) of the 
original instrument panel simulator described by 
Mudd and McCormick (1960), Briefly, the display 
consisted of a 48 matrix of 32 dials, The zero 
position on the dial faces was at one of the eight 
octant points on the dial (a modification made to 
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increase task difficulty). Wighteen-degree pointer 
displacement from the null position was considered 
to be the “deviant” dial position, In each trial § 
searched the panel for the deviant dial and opened a 
corresponding toggle switch directly below the 
deviant dial. The open switch stopped a timer, The 
time required to locate the dial and to make the 
appropriate response was recorded as a_ criterion 
measure of “search efficiency” against which the 15 
experimental auditory displays were evaluated. 


Subjects 


A total of 64 Ss participated in the study, The 
sample was drawn from a mixed, undergraduate and 
graduate population of Purdue University (34 males, 
30 females). The sample was not restricted along 
dimensions of hearing acuity except for those con- 
ditions in which direction was used as a cuing dimen- 
sion, Before assignment to those conditions, Ss were 
given a modest test for binaural balance of “loud 
ness” sensitivity, The Ss who passed the test were 
considered to be acceptable for use in display con- 
ditions which involved the direction dimension, 
Those “failing” the test were assigned to other con- 
ditions of display. Four Ss were reassigned on that 
basis. 


Procedure 


The Ss were fitted with headphones and a “signal 
association period” begun, Signal association train- 
ing involved the presentation of a standardized 
signal training program during which Ss were to 
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associate specified areas of the instrument panel with 
appropriate cuing signals. The particular panel areas 
and audio cuing counterparts presented depended 
upon the display involved in the experimental con- 
dition to which § had been assigned. For given 
conditions each signal condition was presented to Ss 
while only that area with which the signal was to 
be associated was displayed, Hach audio-visual com- 
bination in turn was presented accompanied by a 
verbal commentary by the experimenter (4) describ- 
ing the meaning of the signal as it related to the dial 
panel, At the end of the signal association training 
period Ss were given a test to determine whether 
associative learning had reached a_ preestablished 
criterion (75% correct identifications of the possible 
cuing combinations for the particular cuing condi- 
tion).8 The signal discriminability level used for both 
training and testing to criterion was the median 
value D2 (an ED interstimulus difference of 2.22). 
Upon completion of signal training the first experi- 
mental series was run, An experimental series con- 
sisted of one of three standard random programs 
made up of 32 trials, Three series were run, one 
at each of the three discrimination levels for each S. 
In each of the programs each of the 32 
the panel was presented as deviant 
randomized order of presentation, 


dials on 
once in a 


* Individual subjects were permitted as much time 
as was necessary to reach the preestablished learn- 
ing criterion, However, all training was conducted 
within one experimental period, It is probable that 
learning would have continued if training were ex- 
tended over a period of days or weeks, 


TABLE 2 


ANALysis OF VARIANCE OF Response Times BY CoNprtion, DISCRIMINATION LEVEL, 
AND Secror (ALi, Conprrrons) 





Source of variation d/ 
Between people 63 
Conditions (C) 15 
People within conditions (P) 45 
Within people 1,472 
Sectors (S) 7 
Levels of discrimination (D) 2 
Sx D 14 
SxXC 105 
DX C 30 
SxXxDxC 210 
SX P 336 
DX, 96 
SxXDxXP 672 
Within cell 4,608 
Total 6,143 
*p <.05, ; 


wh < .01, 


so MS i 


15,557.49 


10,582.20 705,48 6,38" 
4,975,29 110,56 
82,467.76 
482.26 68,89 3.39%" 
4,864.13 2,432.06 65,08" 
190.48 13,60 99 
3,260.03 31.05 1.53%" 
2,985.63 99,52 2,66** 
3,068.42 14,61 1.06 
6,822.51 20,30 1,49** 
3,587.14 31.01 2.74"* 
9,847.30 14.65 1.07 
62,938.53 13,66 


113,603.92 
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TABLE 3 
ANALYSIS OF VARIANCE OF RESPONSE TIMES BY CONDITION, DISCRIMINATION LEVEL, 
AND SECTOR (EXPERIMENTAL CONDITIONS) 
Source of variation df sys) MS F 
Between people 59 10,134.38 
Conditions (C) 14 5,873.79 419.56 4.43** 
People within conditions (P) 45 4,260.59 94.68 
Within people 1,380 30,951.68 
Sectors (S) 7 243.48 34.78 1,92 
Levels of discrimination (D) 2 4,816.50 2,408.25 68.36** 
SX D 14 220.66 15.76 1.18 
Sexe 98 Does) 21.72 1.20 
Dae 28 2,910.14 103.93 2 OB 
Sexe DPC 196 2,740.05 13.98 1.05 
Sa by SS 5,696.27 18.08 13a 
DE 90 3,170.57 3U.2o POY fo 
S p< DSK IP 630 9,025.76 14.33 1.09 
Within cell 4,320 56,939.19 13.18 
Total 5,759 98,025.25 
p< .05 
ED < .01 


Experimental Design 


The total of 6,244 observations was analyzed twice 
in terms of a Lindquist Type VII analysis of vari- 
ance design (1953, p. 296). A “general analysis” 
was run over all cuing conditions, and an “experi- 
mental analysis’ run which excluded the control 
condition. One observation was made by each S$ 
on each of four dials within a particular sector at 
each of the three levels of discriminability at a given 
condition of auditory display. Repeated measure- 
ments were made on each of four Ss within a given 
cuing condition. Each entry in the design consisted 
of a designated S’s response time for a specified dial, 
sector, level of discrimination, and cuing condition. 
Variation due to dials and their interaction with 
persons was not considered to be of interest. Those 
terms formed the within-cell residual. The experi- 
mental situation provided for the meaningful inter- 
pretation of all other systematic sources of variation. 


RESULTS 


The analysis of variance over all of the 
design was done both with and without the 
inclusion of the control condition C1 (see 
Tables 2 and 3, respectively, for summaries 
of those analyses). On the basis of the 
decision rule (p = .05) the sources of varia- 
tion of primary interest were disposed of 
as follows: (a) The null hypothesis was 
accepted for Conditions X Discrimination 


Levels X Sectors. Variance from this source 
was attributed to chance. (0) The null hy- 
pothesis was accepted for the interaction 
variance of Sectors X Levels of Discrimina- 
tion. (c) Each of the independent variables 
yielded an F ratio greater than p< .01. 
Decisions concerning the rejection of the null 
hypotheses were withheld because of signifi- 
cant double interactions among those vari- 
ables. (d) The null hypothesis was rejected 
for the following interaction terms: Sectors 
x Conditions (F = p < .01), Levels of Dis- 
crimination < Conditions (F = p < .01). The 
nature of the significant interactions was 
determined by means of appropriate simple 
effects analyses. 

The Neuman-Kuels procedure (Federer, 
1955, p. 21) was applied to ‘‘sense” signifi- 
cantly different means among those condi- 
tions showing significant variance in the 
simple effects analyses. Verbal elaboration on 
the statistical yield of these comparisons is 
not attempted at this point except to clarify 
the various tables and figures by means of 
which the comparisons are presented. In all 
cases means presented in the tables which are 
not joined by a common underline differ sig- 
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TABLE 4 


NEUMAN-KurELs TESTS FOR DIFFERENCES BETWEEN PAIRS OF CONDITION MEANS 
AT LEVELS OF DISCRIMINABILITY 


Discrimination level D1 (3.70 ED) 









































M* 3.30 3.38 3.46 3.81 3.93 4.12 414 430 4.39 4.41 4.73 5.70 5.90 6.07 6.94 8.84 
Gondwonm G14 C16 Co E15 C10, C13 Ell Cj OSam CO Cl7 ee Cla C50 C4aeC3 14Cl 
Discrimination level D2 (2.22 ED) 
M 404 4.44 455 4.76 5.13 5116 5.34 535 546 6.19 6.56 6.76 6.78 6.88 7.86 8.98 
Condition C14 C15 C16 C10 C9 Cil Cae COmee) an Csr iC ommCl. C13) Cl2 Care Cl 
Discrimination level D3 (.74 ED) 
M 4.67 5.09 5.16 5.37 5.60 5.88 5.94 663 6.83 7.58 7.92 8.13 8.87 9.21 9.30 10.10 
Mondo GLO C11. .C16 C2 ‘ComGI5 ~C13, > C8). Ci4 CommGoee Cam Comm C4 (C17 El 
® Mean values in fees ©, 
nificantly at the .05 level of confidence. Con- DIscussIoN 


versely, means joined by a common underline 
are statistically homogeneous at that same 
level of confidence. 

The analyses of the simple effects of cuing 
conditions on mean response times at levels 
of discrimination indicated that condition 
means were heterogeneous at all three levels 
of discriminability (D1, D2, D3). The 
multiple linear comparisons between pairs of 
condition means at each of the levels of 
discrimination are presented in Table 4. At 
each level of discrimination the condition 
means tended to cluster into six homogene- 
ous groups. This fact is indicated in the 
table by six underlines at each level of 
discrimination. 

The same interpretation applies to Table 5 
which summarizes the linear comparisons for 
discrimination level at levels of cuing condi- 
tions. Means joined by a common underline 
are statistically homogeneous at the desig- 
nated levels of display type (those which had 
indicated significant simple effects). 

Similarly, those means joined by a common 
underline in Table 6 were found to be sta- 
tistically homogeneous. Shown in that table 
are those linear comparisons of sector means 
for those display types which yielded sig- 
nificant simple effects of the sector variable. 


In the following discussion each of the 
three independent variables is considered in 
turn. 


Panel Sectors 


The panel sectors variable was found to 
have a significant influence on mean response 
time for just 5 of the 16 displays evaluated. 
In the control condition C1, the upper row 
of sectors (S1, S2, S3, S4) on the instrument 
panel required a significantly shorter response 
time than the bottom row of sectors (S5, S6, 
S7, S8). The average time required by opera- 
tors to locate and respond to a deviant dial 
on the bottom row of sectors was 11.12 sec. 
as compared to 7.42 sec. if the deviant dial 
was in the upper row of sectors. The dif- 


TABLE 5 


NEUMAN-KUELS TESTS FOR DIFFERENCE BETWEEN 
Patrs OF DISCRIMINATION LEVEL MEANS AT LEVELS 
or Curmnc CONDITION 











Condition D1 D2 D3 
C7 (ur xD) 4.00 °5.34. 8.13 
Co thx DD) 3.46 5.13 8.87 
C2 Gor 6.) 3S ID) 4.73 6.88 9.30 





118 


SAMUEL A. Mupp 


TABLE 6 


NrEUMAN-KUELS TESTS FOR DIFFERENCES BETWEEN PAIRS OF SECTOR 


Condition Cl (Control condition) 


Means Av LEVELS OF CUING CONDITION 

















M 6.94 7.26 7.36 8.14 10.40. 10.90 ila 118} 12.05 

Sector S3 S4 S1 S6 S8 Ss S7 
Condition C3 (D) 

M 6.35 6.71 6.90 7.28 7.36 8.11 8.55 9.32 

Sector S3 S8 $4 S2 S6 Si S5 
Condition C4 (1) 

M 6.05 6.61 6.76 7.05 7.40 7.68 Nol? 9.50 

Sector S3 S7 S5 S4 S1 S8 $2 
Condition C7 (Dur X D) 

M 4.26 Sol 5.59 5.68 6.02 6.32 6.95 7.24 

Sector $2 S4 Ss S1 S7 S3 S6 
Condition C8 (Dur X I) 

M 4.60 Sel 5.24 5.50 5.54 5.59 6.48 7.72 

Sector S3 S6 S4 S8 S2 Si. = 358 








ference in response time (3.70 sec.) was 
large enough to be of practical significance 
as well as statistical significance. In the 
remaining conditions which showed differ- 
ential mean response times across sectors, 
just one sector was “out of line.” That is, 
all sectors but one in each condition were 
contained in a single homogeneous pool made 
up of the remaining sectors. In condition C3 
(direction) the difference in mean response 
time between S5 (9.32 sec.) and the pooled 
response time (7.32 sec.) of the seven homo- 
geneous sector means was 2.0 sec. In condi- 
tion C4 (intensity), S2 (9.50 sec.) was non- 
homogeneous with all other sectors (7.04 sec. 
average), a difference in mean response of 
2.46 sec. Two of the six bidimensional dis- 
plays tested showed sector heterogeneity of 
mean response time. In cuing condition C7 
(Dur X D), a sector S2 yielded a significantly 
shorter response time (4.26 sec.) than sectors 
S3 (6.95 sec.) and S6 (7.24 sec.). The differ- 
ence in response time between S82 and all other 
sectors (which were homogeneous among 
themselves—average response time equal to 
6.16 sec.) was 1.90 sec. In cuing condition 
C8 (Dur XI) sector S5 gave rise to a re- 
sponse time of 7.72 sec. as compared with an 


average of 5.45 sec. for all other sectors, 
which were homogeneous. The difference in 
mean response time amounted to 2.27 sec. All 
other conditions of auditory display showed 
homogeneity of means across all sectors. 
Homogeneity across sectors may be inter- 
preted as indicating that dials on all sectors 
of the instrument panel were responded to with 
comparable mean response times. Conversely, 
heterogeneity of sector mean response time 
may be interpreted as indicating that some 
sectors received “preferential attention” by 
operators in that dials in some sectors re- 
quired a shorter time for their location and 
response than dials in other sectors. It is not 
possible to attribute differential sector re- 
sponse time directly to various habitual pat- 
terns of scanning the panel by Ss. A number 
of differential patterns on the part of indi- 
vidual Ss could contribute to what might 
appear to be a specific pattern of search. 
However, the data do seem to suggest that 
such general, habitual patterns of search 
might exist. The nature of such patterns, if 
extant, could be determined by camera ob- 
servation of Ss’ eye movements during the 
instrument monitoring task. The data do 
allow the generalization to be made that the 
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sector effect is broken up by all three- and 
four-dimensional displays and the majority 
of the one- and two-dimensional auditory 
displays. 


Interstimulus Distance 


The discrimination variable was found to 
have a main effect above and beyond a 
marked interaction effect with various cuing 
combinations (type of display). With three 
minor exceptions, mean response time in- 
creased consistently with an increase in the 
difficulty of discrimination required (de- 
creasing interval between cuing stimuli). In 
general, it may be stated that the increase of 
discrimination difficulty leads to an increase 
in mean response time across all conditions 
of cuing. 


Type of Display 


The last independent variable to be con- 
sidered is the conditions of auditory cuing, 
which involved various display combinations 
of the four auditory dimensions evaluated. 
In general, there was only a slight decrease 
in response time beyond the two-dimensional 
display combinations, and in fact in some 
cases the addition of a third dimension of 
cuing acted to increase mean response times. 
For example, condition C12 (Dur x I x D), 
a three-dimensional display, showed a longer 
mean response time than any of the two-di- 
mensional displays at all levels of discrimina- 
tion difficulty. It was found that the various 
display conditions interacted markedly with 
levels of discrimination difficulty, such that 
the effectiveness of a given cuing condition 
depended not only upon the number of di- 
mensions in the cuing combination, but also 
upon the amount of interstimulus interval be- 
tween cuing stimuli in the various conditions. 

A comparative analysis of the four cuing 
dimensions can be made by first considering 
their relative vulnerability to deterioration 
with decreasing interstimulus interval, then 
considering the combinative characteristics 
of each dimension when it is combined 
with other dimensions without regard to 
interaction with levels of discriminability, 
and finally considering their combinative 
characteristics in interaction with levels of 
discrimination, 
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Singly, the frequency dimension not only 
showed the shortest mean response time of 
all four unidimensional displays, but also 
suffered no appreciable deterioration due to 
decreasing interstimulus interval (increase in 
difficulty of discrimination). The average 
response time over all levels of discrimination 
was 5.51 sec. Duration was the second most 
effective cuing dimension when used in a uni- 
dimensional display. The dimension showed a 
systematic increase in response time as a 
function of increasing difficulty of discrimina- 
tion. Response times for that dimension were: 
D1 (3.70 ED)-5.90 sec.; D2 (2.22 ED)- 
6.56 sec.; D3 (.74 ED)-7.58 sec. Response 
times for the duration cue at all levels 
were lower than those of the direction and 
the intensity dimensions at comparable levels 
of discriminability. Intensity used alone at 
easy and moderate levels of discrimination 
difficulty was more effective than direction 
used alone at those same levels. Mean re- 
sponse times at D1 (3.70 ED) were 6.07 
sec. and 6.94 sec. for intensity and direction, 
respectively; at discrimination level D2 the 
times were 6.56 sec. and 7.86 sec., respec- 
tively. At the most difficult level of discrimi- 
nation D3 (.74 ED) direction was more 
effective than intensity; (mean response 
times for the two conditions at that level were 
7.92 sec. and 9.21 sec., respectively). 

In order to compare the combinatory 
characteristics of the various dimensions it is 
assumed that discrimination level D1, which 
represents an interstimulus interval of 3.70 
ED units, represents all discrimination levels 
greater than 3.70 ED. That is, the assump- 
tion is that further increase in interstimulus 
interval would not decrease the difficulty of 
the required discriminations between cuing 
stimuli, and therefore would not effect fur- 
ther decrease in response times. The fact that 
an interstimulus interval of 3.70 ED covers 
more than half of the working range for each 
of the dimensions makes the further increase 
in discriminability an unlikely result of in- 
creased difference between the cuing stimuli. 
If the assumption is permissible, then the 
differential response times for the various 
dimensions in combination with one, two, or 
three of the other dimensions at level D1 
(3.70 ED) may be interpreted as being due to 
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the combinative characteristics of the par- 
ticular dimensions. The data indicated that 
all multidimensional cuing combinations 
yielded homogeneous mean response times. 
The range of means within the homogeneous 
pool ran from 4.73 sec. for condition C12 
(Dur X I X D) down to 3.30 sec. for condi- 
tion Cl4 (DXF XI). Both the direction 
and the intensity dimension are represented 
in the two conditions on the extremes of the 
range of mean response times. Further, di- 
rection and intensity, when used together 
in a bidimensional display, showed a rela- 
tively low mean response time of 3.46 sec. 
In general, the data provided no basis for 
the inference that one sound dimension 
“combined” more readily than another. This 
generalization is based upon the assumption, 
discussed above, that maximum possible 
discriminability prevailed at discrimination 
level D1 (3.70 ED). If the assumption is 
not valid, then the possibility remains that 
at greater magnitudes of interstimulus inter- 
val some dimensions might combine more 
effectively than others. 

When the various cuing combinations are 
considered at more difficult levels of discrimi- 
nability, the data indicate that some dimen- 
sions combine more or less effectively than 
others (as measured by differential mean 
response times). For the two-dimensional 
displays, each dimension in combination with 
frequency did not show significant differences 
in response time as a function of increasing 
difficulty of discrimination. However, when 
direction, duration, and intensity were com- 
bined in their possible two-dimensional pair- 
ings, each of the three combinations showed 
a significant difference in response time with 
increasing difficulty of discrimination. The 
mean differences in seconds for the various 
conditions were as follows: 


Dine 2e 1s 
Cy (Dur xed) 4.30 5.34 8.13 
C8 (Dur X I) 4.39 6.19 6.63 
GO Cle) SrA soe 38.8.4 


All three-dimensional displays showed sig- 
nificant variability of mean response time 
as a function of increasing difficulty level. 
The mean response times (in seconds) for 
those combinations of the three levels of dis- 
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crimination were as follows: 


D1 4D Zanes 
C12 (Dur x IX D) 4.73 6.88 9.30 
C13 (Dur x Fx D) 4.12 6.78 5.94 
C14 (Dx Fx I) 3.30 4.04 6.83 
C15 (F x Dur XI) 3.81 4.44 5.88 


In the above conditions, those combinations 
in which frequency was used did not deterio- 
rate with increasing difficulty of discrimi- 
nability as severely as did C12 which was 
made up of all dimensions except frequency. 
Condition C16 which contained all four sound 
dimensions did not show a significant increase 
in response time as a function of increasing 
level of discrimination difficulty. Mean re- 
sponse times (in seconds) for that combina- 
tion were: D1, 3.38; D2, 4.55; D3, 5.16. 

In general, the results of the study verified 
the earlier exploratory findings of Mudd and 
McCormick (1960). Their conclusions were 
supported by the data: that search time in 
an instrument monitoring situation is reduced 
through the use of auditory cues, that dif- 
ferential response time across panel sectors 
can be eliminated by means of auditory cues. 
In addition, their finding that search time 
did not decrease significantly with the use 
of more than two dimensions of cuing seems 
to be borne out. In the present study Ss 
were given ample time (see Footnote 3) to 
learn the cuing signals and to associate the 
cues with corresponding parts of the panel. 
Although there is evidence that marginal in- 
creases in search efficiency did occur with 
the addition of a third and a fourth cuing 
dimension, the differences were not statisti- 
cally significant. Since each cuing dimension 
was dichotomous and each dimension served 
to reduce spatial search by one half, each di- 
mension contained one “bit” of information. 
Since three multidimensional-dimensional cu- 
ing conditions (C14, D x F XI; C15, Fx 
Dur XI; C16, F X Dur XI xD) yielded 
response times (C14—4.93 sec., C15—-4.92 sec., 
C16—4.56 sec.) somewhat shorter than the ay- 
erage response time for all two-dimensional 
displays (5.54 sec.), it appears that Ss can 
utilize somewhat more than two dimensions of 
such audio cues. On the basis of the trend to- 
ward further decrease in response time with 
the addition of a third and a fourth dimension 
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of cuing, the hypothesis is tentatively adopted 
that channel capacity for this type of informa- 
tion falls somewhere between 2 and 2.5 bits 
per presentation. It should be noted that the 
use of just two bits of auditory information 
effects a considerable increase in instrument 
monitoring efficiency as indicated by a de- 
crease in mean response time per dial from 
approximately 10 sec. when auditory cues 
are not used to approximately 5 sec.. when 
various combinations of two or more audi- 
tory dimensions are used. The utility of such 
auditory displays is further enhanced by the 
fact that operators need not look at the panel 
when all dials are ‘‘normal’”—the visual sense 
is freed for use in other tasks. The auditory 
modality can be used to alert the operator 
to machine malfunction and initiate general 
localization of the trouble. Only after those 
two steps have been accomplished is it nec- 
essary for the more precise visual system to 
enter the situation and make final localiza- 
tions of the specific instrument. The use of 
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auditory displays in the instrument monitor- 
ing task involved in man-machine systems 
fully exploits the audio signal characteristics 
of attention-demandingness and general lo- 
calization properties, while at the same time 
does not make demands on the auditory 
sense which are better filled by other sensory 
means. The conclusion is drawn that audi- 
tory display of information can be used 
effectively in instrument monitoring tasks. 
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RATING FORM 


SEYMOUR LEVY anv D,. MIRIAM STENE 
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A forced-choice rating form was revalidated by using a type of construct 
validation based on the hypothesis that a manager’s effectiveness is reflected in 
the performance level of his subordinates. 11 plant managers were ranked on 
overall effectiveness by 3 independent judges, and the relationship between 
these rankings and the average performance report scores of 142 first-line 
supervisors in the respective plants was determined by analysis of variance 
and correlational techniques, Results showed a significant overall relationship 


between plant-manager rankings 


and 


production-supervisor scores on the 


forced-choice form (p= .005) and significant correlations on 2 of the 6 sub- 
scales, with the highest relationship apparent in the Human Relations area 
(p= .025). The findings support the hypothesis of a relationship between 
management effectiveness and subordinate performance, and provide evidence 
to indicate continued validity of the rating instrument. 


The role of the psychologist in industry is 
not only to develop and install measurement 
devices for selection, placement, evaluation, 
and the like, but also to institute continuous 
controls to insure that the validity of these 
instruments is maintained at acceptable levels 
over time. The present report is concerned 
with this problem of continuous controls, 
offering an alternative strategy in the revalida- 
tion of a forced-choice performance report. 

The performance report under considera- 
tion was initially developed for the evalua- 
tion of production supervisors (Huttner & 
Katzell, 1957). It provides an overall score, 
as well as separate scores on the following 
subscales: Technical Competence, Work 
Habits, Delegation and Leadership, Human 
Relations, Cost Consciousness, and Potential. 
In the original validation study the total score 
and the scores on the subscales were correlated 
with rankings and ratings by two levels of 
management, with significant results emerging 
to establish the validity of the instrument. 

A supplementary page of the report form 
presents five-point rating scales on which the 
rater is asked to make “subjective” judg- 
ments of the individual’s overall performance 
and his performance on each of the six sub- 
scale factors. The data thus obtained permit 
comparisons between subjective ratings and 
the more objective performance report scores. 

There has been considerable support of the 
forced-choice technique as a replacement for 
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traditional rating methods, with emphasis pri- 
marily on its greater objectivity and greater 
emphasis on job behaviors, resulting in 
diminished effects of halo and of errors of 
central tendency, or conversely, of too much 
piling up at the top end of the scale. How- 
ever, Travers (1951) raises a basic criticism 
of the method in the following statement: 
“they all involve the unsatisfactory process of 
predicting ratings with ratings. Studies which 
involve criteria other than judgments must 
be made and must form the ultimate basis 
for evaluating the effectiveness of various 
assessment techniques.” 

The study here reported deals with the use 
of “criteria other than judgments” in a re- 
validation of the performance report. The 
original study utilized the traditional method 
of validating the forced-choice form against 
management ratings. In the revalidation 
study, conducted 2 years later, an alternative 
approach was used instead, placing emphasis 
on construct rather than concurrent validity 
(Cronbach, 1960). 

It was first hypothesized that the caliber of 
the plant manager is reflected in the per- 
formance level of his subordinates, and that 
therefore the subordinates of the more effec- 
tive manager will in general be superior in job 
performance to those of the less effective 
manager. Then the validity of the rating 
form was tested by determining the relation- 
ship between the effectiveness level of the 
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manager and the obtained ratings of his 
subordinates. 

Since plant-manager competence was in- 
dependently established, and since the plant 
managers themselves did not do the ratings, 
any evidence of relationship between the 
forced-choice scores and management effec- 
tiveness can be attributed to the differential 
effects of the work climates provided by the 
plant managers rather than to differences in 
rating styles. 

The concurrent validity method used in the 
original validation determined the extent to 
which the performance report scores dis- 
tinguished between the more and less effective 
supervisors within a plant, as judged by an 
independent criterion of job performance 
effectiveness. In the design used in the cur- 
rent study, a test for differences among plants 
was used as the approach to validation, with 
these differences considered to be in part a 
function of the competence of the plant 
manager. 

Though the central theme of this paper is 
to review the validity over time of a forced- 
choice performance report, a related problem 
is that of the consequences of the use of the 
forced-choice technique on subjective evalua- 
tions of performance. Conflicting views have 
been expressed in the literature (Baier, 1951; 
Richardson, 1951; Travers, 1951) regarding 
the effects of using a forced-choice rating 
form as a contributor to increasing the vari- 
ability of the subjective rating, and thereby 
presumably increasing the discriminability 
and thus possibly the validity of the measur- 
ing instrument. Accordingly, an examina- 
tion was made of changes in the subjective 
ratings over time, following the use of the 
forced-choice performance report for a 2-year 
period. 


MrtHop 


Eleven plant managers were ranked on overall 
management effectiveness by three independent 
judges, all of whom were familiar with the plant 
operations: the Director of Industrial Engineering, 
the Training Director, and the Production Personnel 
Manager. Intercorrelations between the rankings 
were +.65, +.90, and +.95 (significant at .01 level 
of confidence). The rankings by the three judges 
were averaged to provide a composite criterion for 
each plant manager, The performance report ratings 
were completed by plant superintendents and gen- 


123 


eral foremen, who report to the plant managers. 
Ratees were 142 first-line supervisors in the 11 
plants, the numbers within plants ranging from 
4 to 48. 

An analysis of variance was conducted to deter- 
mine the overall relationship between the criterion 
of management effectiveness and the performance 
report ratings of the first-line supervisors in the 
respective plants. Using a linear analysis technique, 
the 11 managers were ranked from good to poor 
(+5 to —5) and the averages of the production 
supervisors’ scores on each of the six subscales were 
listed for each plant, resulting in an 11 X 6 table. 
For the “between” calculations, each of the plant 
averages was assumed, conservatively, to be based on 
an ” of four (the number of supervisors in the 
smallest plant). 

In addition to the overall analysis of variance, 
simple rank-difference correlations were computed to 
determine the degree of relationship between plant- 
manager rankings and the average scores of the 
first-line supervisors in the respective plants, on 
the subscales as well as on the total performance 
report. 

Means and sigmas were compared to determine 
possible differences in the performance report totals 
and subscale scores for the two periods. 

To test for the significance of changes in sub- 
jective evaluations over time, the chi-square tech- 
nique was used. 


RESULTS 


There were no. differences between the 
means and variabilities of performance report 
scores for the total population in the original 
study and those for the total population in 
the current analysis. Had such differences 
been found, the overall change in scores would 
have been a factor to be considered in inter- 
preting the results of the revalidation study. 

Analysis of variance data are presented in 
Table 1 for all subscales of the forced-choice 
report form. Linear ranking of plants with 
respect to plant-manager ratings showed a 
significant relationship with scores of super- 
visors on the performance report, thus sup- 
porting the first hypothesis and providing a 
measure of the validity of the rating instru- 
ment (F ratio = 9.477, df = 1, p .005). The 
high F ratio for scores on the six subscales is 
to be expected (Line 3), since the number of 
scorable items varied for the different scales. 
Nonsignificant interaction (Line 4) indicates 
that scores on the subscales tended to vary in 
the same manner, which may again be attrib- 
utable in part to the lack of equivalence of 
maximum scores of the subscales. The signifi- 
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TABLE 1 


ANALYSIS OF VARIANCE ON RANKINGS OF PLANT MAN- 














AGERS AND PERFORMANCE REPORT SCORES OF 
PRODUCTION SUPERVISORS 
Source df MS F 

1. Linear plants (L) { 718.23 9,48* 
2. Remainder plants (RP) 9 452.02 5.96* 
3. Ratings (R) 5 2,236.29 29.51* 
Aca 5 42.44 —_— 
GY WR Se IRS 45 59.44 — 
6. Within plants 786 75.79 —_— 

*p <.01. 


cant F ratio for “remainder” indicates a con- 
siderable variance not attributable to linear 
relationships. 

Table 2 presents rank-difference corre- 
lations between the average scores on the 
subscales and the criterion of management 
effectiveness. Note that the Human Rela- 
tions factor shows the strongest relation- 
ship with plant-manager rankings (+.62, p 
< .05), while Delegation and Leadership ap- 
pears to contribute the least (—.07). With 
the exception of Delegation and Leadership, 
all of the relationships are positive. 

For comparative purposes, the correlations 
obtained in the original study are included in 
the table also. The correlations between the 
original criterion and the total score and 
subscales of the report form are generally 
higher than those obtained in the construct 
approach. 

Table 3 presents distributions on subjec- 
tive ratings for the original validation period 


TABLE 2 


CORRELATION OF PERFORMANCE REPORT SCORES WITH 
CRITERION IN ORIGINAL AND REVALIDATION STUDIES 








Original Revalida- 


Performance report scales study® tion study» 











1. Technical competence pOUne +.36 
2. Work habits Ag** +.50* 
3. Delegation and leadership Slee =—.07 
4. Human relations EDO ma + .62* 
5. Cost consciousness .66** +.15 
6. Potential ot* +47 

Total score oes +.51* 

“N= 61, 

bN =11. 

* p< .05. 

™ p< O01, 
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and for the 2-year follow-up. The latter 
shows a significantly greater spread of ratings 
(x? = 13.702, df = 2, p < .01), with slightly 
more low ratings and a greater increase in 
higher ratings. 


DISCUSSION 


This study demonstrates the appropriate- 
ness of using alternative strategies in the 
assessment of the validity of measurement 
instruments in industrial situations. The con- 
struct criterion employed in this analysis 
dealt with some assumptions about mana- 
gerial effectiveness and its consequences upon 
supervisory behavior. Certainly other con- 
struct hypotheses can be developed about 
other facets of performance, and these in 
turn may be used as a basis for validating 


TABLE 3 


COMPARISON OF SUBJECTIVE RATINGS OF OVERALL 
PERFORMANCE ON ORIGINAL STUDY AND ON STUDY 
ConpucTED Two YEARS LATER 





Low High 
(improvement - Middle (superior or 
Study needed) (average) outstanding) 
Original study (9) 9% (65) 68% (22) 23% 
2 years later> (25) 14% (77) 44% (72) 42% 





Note.—x?2 = 13.70, df = 2; p <.01. 

aN = 96. 

bN = 174. 
either tests or performance-review measures. 
For example, changes in managerial ef- 
fectiveness over time should be reflected in 
changes in the reported performance level 
of subordinates if the measuring instrument 
is valid. 

It is to be expected that there would be 
differences between the particular validity co- 
efficients developed in this follow-up study 
and those obtained in the original validation. 
In the original study all of the performance 
report subscales had significant correlations 
with the criterion, and the concurrent valid- 
ity of the total score was .73. In the present 
study, only two of the six subscales, and the 
total score, had statistically significant rela- 
tionships with the criterion, though with only 
one exception (Delegation and Leadership) 
all were in the expected direction. The varia- 
tions in these validity coefficients should be 
considered as functions of changes in time, 
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as well as resulting from an _ alternative 
conception of validity. 

The researchers are unable to develop a 
plausible rationale to account for the specific 
changes in construct validity coefficients—for 
example, the low relationships for the Cost 
Consciousness and the Delegation and Leader- 
ship categories. A suggestion for future 
research would include testing for concurrent 
and construct validity at the same time, in 
order to better understand the particular 
significance of test-criterion relationships 
under these conditions. 

In this study differences were tested among 
plants rather than among individuals within 
plants. One might expect a greater variation 
of scores within a plant than between plants, 
since the between-plant figure is an averaging 
of the variation within a plant. This decrease 
in variability might serve to make higher 
correlations less likely. 

Some of the opinions expressed by Richard- 
son (1951) are supported by the finding that 
there is a greater variance in subjective 
ratings following the 2-year experience with 
the forced-choice performance report. How- 
ever, there are a number of factors that might 
account for these results other than the 
installation and utilization of the forced- 
choice technique. Among these are: (a) 
changes in personnel—either among raters 
or ratees, (0) changes in actual performance 
of ratees, (c) special training programs, and 
(d) the effects of feedback of forced-choice 
scores to the managers. 

In this study, however, the rater and 
ratee populations were essentially unchanged 
during the interval between ratings. Although 
changes in actual performance of ratees may 
have occurred, the similarity of forced-choice 
score distributions between the original and 
follow-up studies suggests that there were no 
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major changes in performance within this 
population during this time. No formal train- 
ing program in evaluation was _ instituted 
during this period, but the importance as- 
signed to the rating process may have in- 
creased as a result of the installation of the 
new technique. The other possibility that 
still remains as a source of increasing sub- 
jective variation is the method by which 
performance report scores were reported to 
the plants. The scores were presented to the 
managers in rank order from high to low. 
It is possible that this display of variation 
may have stimulated greater variation in the 
use of subjective ratings. An alternate pos- 
sibility, in line with Richardson’s thesis, is 
that increased discriminability results from 
the more careful analysis of performance in- 
herent in the forced-choice technique. Un- 
fortunately, we do not have the data to 
distinguish which of these alternatives is most 
applicable in this setting. Also, because of 
lack of data, the critical question could 
not be answered—namely, whether the 
change in the variability of subjective ratings 
contributes to the validity of the instrument. 
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DEVELOPMENT AND ANALYSIS OF A “CUMSHAW 
TOLERANCE” SCALE* 


YVONNE TREADWELL 
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A 9-item scale to measure employee attitudes toward “cumshaw” (misuse of 
company time or materials) was developed through the Guttman scaling 


process. 


Scores on the experimental version of the cumshaw tolerance scale 


were then correlated with selected psychological and social-group variables. 
Differences in cumshaw tolerance were found to be associated with the group 
variables of age and educational level but occupational groups did not differ 


significantly in relative cumshaw tolerance. 


Individual differences in selected 


psychological variables could not account for individual differences in cumshaw 


tolerance. 


According to the Oxford English Dictionary 
(1933) the word cumshaw is derived from 
the Chinese kan hsieh, meaning grateful 
thanks, a phrase of thanks used by beggars. 
In the United States Navy, and in its allied 
civilian organization, cumshaw has come to 
mean something that is procured without 
official payment. The term may be used to 
describe activities within the sphere of of- 
ficial operations or it may refer to activities 
leading to personal gain. The present study 
uses the personal gain connotation of the 
word. 

This study was conducted to examine the 
attitudes toward cumshaw activity among 
employees of a government research and 
development laboratory. No judgments were 
intended as to the moralistic or legalistic 
aspects of the problem, nor was any at- 
tempt made to measure the actual extent 
of cumshaw participation within the organi- 
zation studied. The objectives of the study 
were: 


1. To develop a measure of employee- 
cumshaw tolerance. Incidents of cumshaw 
behavior to be rated were limited to personal 
gain items, as indicated above, in order to 
focus on the more ambiguous aspects of the 
practice. A spectrum of behavior was pre- 


1 This study was begun under the sponsorship of 
George F. J. Lehner while the author was an under- 
graduate student at the University of California 
at Los Angeles. Data analyses were later completed 
under the guidance and criticism of Robert W. 
Stephenson, head of the Creativity Research Group 
at the United States Naval Ordnance Test Station, 
China Lake, California. 
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sented with incidents ranging both in severity 
and in accessibility to different employee 
groups so that the final scales would be 
meaningful in terms of the hypotheses to 
be tested. 

2. To measure the relationship between 
selected social group variables and employee- 
cumshaw tolerance. Variables included were 
age, educational level, and occupational 
group. Cumshaw practices have historically 
been a part of the business scene, with the 
amount of activity varying from organization 
to organization (and from group to group 
within the organizations) according to the 
sanctions that are present (Dalton, 1959; 
Flynn, 1931). Thus, it was considered that 
the occupation of the respondent would be 
an important variable affecting cumshaw 
tolerance. Similar backgrounds of group 
members, shared group norms, and the status 
of the occupational group within the larger 
organization would, it was felt, influence both 
the amount and types of cumshaw activity 
acceptable to the individual employee in 
the group. 

3. To measure the relationship between 
cumshaw tolerance and selected personality 
variables. Cumshaw activity frequently in- 
volves infractions of organizational rules (or, 
at least, ambiguous interpretations of regula- 
tions). It appeared likely that the rigid, 
authoritarian individual who is uncomfortable 
with ambiguous stimuli (Adorno, Frenkel- 
Brunswik, Levinson, Sanford, 1950; O’Con- 
nor, 1952) would disapprove of cumshaw 
practices. Therefore, authoritarianism (as 
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measured by the California F Scale) and 
intolerance of ambiguity were selected as 
probable negative personality correlates of 
cumshaw tolerance. 


DEVELOPMENT OF CUMSHAW ‘TOLERANCE 
SCALE 


A set of 26 cumshaw incidents was pre- 
pared, based partly on interviews with a 
number of management and technical person- 
nel in a research and development laboratory, 
and partly on the personal knowledge of the 
experimenter (Z) (10 years working experi- 
ence in administrative and personnel-manage- 
ment positions at the organization studied). 
The 26 cumshaw incidents chosen were 
meant to be a representative sample of the 
types of cumshaw activity that could be 
engaged in by the various employee groups 
in the laboratory. The incidents ranged from 
extremely petty instances to more serious 
offenses. 

Subjects (Ss) were asked to rate the 
behavior depicted in the cumshaw incidents 
on a seven-point scale ranging from com- 
_vletely unacceptable (Number | on the scale) 
to completely acceptable (Number 7 on the 
scale). The sample group of 101 labora- 
tory employees who rated the incidents in- 
cluded four occupational categories: engi- 
neer-scientists (MV = 25), clerks (N = 25), 
tradesmen (N=25), and administrators 
(= 26): 

Since the cumshaw incidents were expressly 
intended to measure attitudes toward a 
spectrum of cumshaw behavior, there was 
obviously a tacit assumption of a single 
dimension underlying the items—an attitude 
toward cumshaw behavior ranging from 
tolerance to intolerance. A Guttman scalo- 
gram analysis appeared to be an appro- 
priate check on whether or not there was 
any empirical basis for the assumption of 
unidimensionality, 

Scalogram analysis provides a cumulative 
scale that is designed to ensure that test 
items lie approximately along a single dimen- 
sion. On a cumulative scale, the items are ar- 
ranged so that an S who responds favorably to 
any given item will also respond favorably to 
all items of lower rank order. To the extent 
that S responses fit the theoretical model of 
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unidimensionality, one may interpret scale 
scores of the Ss based on the statements as 
falling along the same unidimensional con- 
tinuum (Edwards, 1957). 

The computer program used for the scalo- 
gram analysis (Shutz, 1961) divided the 
sample into two groups, a test group 
(N = 50) and a check group (VN =51). 
Scores on the cumshaw incidents were di- 
chotomized according to the intensity of 
feeling on the given items, the “neutral” 
point being selected as the cutoff point be- 
tween acceptance and rejection of the item. 
The items were scaled on the basis of the 
test group’s ratings, and the scale obtained 
was checked against the check group’s ratings. 
Reproducibility coefficients were given for the 
test, check, and total groups. (The reproduci- 
bility coefficient for a cumulative scale is a 
measure of the degree of accuracy with which 
it is possible to predict an S’s responses to 
any given scale item when only his total scale 
score is known.) 

A nine-item scale was obtained that had 
the following reproducibility coefficients: test 
group, .882; check group, .836; total sample 
group, .859. These coefficients, although some- 
what lower than optimum, were accepted as 
adequate in order to retain as wide a variety 
of items as possible in the cumshaw tolerance 
scale. All of the items in the nine-item scale 
met the Guttman criterion for “goodness of 
fit.” That is, the percentage of error (S re- 
sponses that fell outside the category in which 
they theoretically belonged) for any single 
item did not exceed one-half of the percentage 
accepting the item, or the percentage reject- 
ing the item, whichever was smaller. The 
cumshaw incidents included in the nine-item 
cumshaw tolerance scale were as follows: ” 


1. An employee’s social club planned a 
special event intended to raise money for 
future activities of the group. The member, 
who worked as a technical illustrator, spent 
several hours of his own time preparing 
advertising posters. He worked in his office, 
using supplies that were on hand. 

2The cutting point is indicated for each item. 
Italic numerals were scored as rejecting the item 
and numerals in regular type were scored as accept- 


ing the item. The distribution of subject responses is 
indicated below the rating scale. 
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Completely Completely 


unacceptable 1 2 3 4 5 6 7 acceptable 
Sie ee Oe ey ee) eee 

2. An office was being restocked, and defec- 

tive and obsolete supplies were being dis- 

carded. An employee removed a still-operative 


staple remover from a wastebasket and 
took it home. 

Completely Completely 
unacceptable acceptable 


Tie eee Our 
21 oe Ome ae aes O 

3. An employee was teaching an employer- 
sponsored extension class at night, for which 
he received compensation from an academic 
institution. He asked his secretary to prepare 
his class outline and certain short reading 
assignments. She did so, using office supplies 
during her normal working hours. 


Completely 
acceptable 


Completely 


unacceptable 123 45 6 7 


AVN Gy {3 il QD ils} 

4. An employee enrolled in an extension 
course in an area of study related to the work 
he was doing on the job. He submitted a 
request to the laboratory library to purchase 
the textbook for him on an indefinite loan 
basis. 


Completely 
acceptable 


Completely 
unacceptable 1 
2 One 


2 ee OT 
14 8 26 

5. An employee was preparing a thesis as 
the final task toward a graduate degree from 
a university. The rough draft was typed on 
company time by his secretary, transcribed 
from the office dictating machine. Final typing 
and reproduction were done by a _ typist 
specially hired by the employee to do the 
job. The subject matter of the thesis was of 
interest to the employer since it treated prob- 
lems related to project work then being done. 


Completely 
unacceptable 


Completely 


L273 V4 56 4 “acceptable 


Aijieeoe (Om Lom Ommliim 37 

6. An employee who worked in an elec- 
tronics shop found that his personal hi-fi set 
needed an inexpensive tube. Since the tubes 
were readily available in his shop and since 
the particular tube was not for sale in the 
nearby community, he used one from the shop 
for his set. 
Completely 


unacceptable if 2 3 4 5 
et) 4 


Completely 
6 7 acceptable 
Sil 
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7. The official address of the employer had 
recently been changed, necessitating the print- 
ing of new letterhead stationery. A consider- 
able quantity of the old style was to be 
discarded. An employee took home a ream of 
the discontinued style for his personal use 
as scrap paper. 

Completely 


teens 4 5 06> AiaEteceptainte 
Pe COP ae alGy SP 2 


Completely 
unacceptable 


8. An employee was in the habit of carry- 
ing pencils in his pocket, with the result 
that he frequently carried them home. He 
seldom bothered to collect them and return 
them to the office. 


Completely 
unacceptable ss mmm cmiurd ante 
13 GMS eee, 


Completely 
6 7 acceptable 
9 9 
9. An employee went to considerable effort 
attempting to purchase Plexiglass (for suit- 
able plastic sheets) for replacement windows 
for his small foreign car. None was available 
locally, so a friend secured two pieces of 
plastic for him from one of the laboratory 
shops. 


Completely 
unacceptable hy 
68 12 


Completely 


3 4 5 6 7 acceptable 

ORS ome Oma) 

RELATIONSHIP OF THE CUMSHAW TOLERANCE 
SCALE TO SELECTED VARIABLES 


At the time the Ss completed their ratings 
for the acceptability of the cumshaw _ inci- 
dents, several other measures were also ob- 
tained. These included: (a) Form 40/45 of 
the California F Scale (Adorno et al., 1950) 
with eight items of Walk’s scale measuring 
“intolerance of ambiguity” (O’Connor, 1952) 
interspersed among the F Scale items; (0) 
personal identification items (age group, edu- 
cational level, occupational category, and 
sex); (c) a scale for rating self-participation 
in cumshaw activity; and (d) a scale for 
rating the amount of cumshaw activity felt 
to be present in the laboratory (‘other 
participation’”’). 

The scores obtained from the foregoing 
measures were correlated with each other and 
with the cumshaw tolerance scale. Table 1 
presents the resulting intercorrelation matrix. 

In order to understand the relative weights 
of the variables, a multiple correlation co- 
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TABLE 1 
CORRELATIONS AMONG SELECTED VARIABLES AND CUMSHAW TOLERANCE SCALE 

Intolerance Self-par- Otherpar- Cumshaw 

of ambiguity Age Education ticipation ticipation tolerance 
F Scale CHA ise 324** OOo ae — .004 Pithils — .203* 
Intolerance of ambiguity 283 — 490*** Ald 049 = 139 
Age GOS Tae — 025 157 — 336*** 
Education 172 — 103 2935s 
Self-participation .003 126 
Other participation —.141 





Note.—N for all variables = 101, except for “Other Participation’’ where N = Si. 


* pb < .05 


efficient was computed, using the F Scale, 
intolerance of ambiguity, age, and educational 
level as predictors of cumshaw tolerance. The 
relationship was significant at the .01 level 
(7="386; .R —.337, when corrected for 
shrinkage). The relative beta weights were 


SHE UNG eee toy cde ols © ous cs ss oysisusute —.0370 
Intolerance of ambiguity ........ 0648 
meucationals level wna ty. os. t < ccc ees .2058 
AS CPA act Reheriversrsters Gere. ed lahs ayahe —.2686 


The low beta weights for the personality vari- 
ables (F Scale and intolerance of ambiguity) 
and the intercorrelation of these variables 
with age and education (see Table 1) indicate 
that the relationship of the personality vari- 
ables to the cumshaw tolerance criterion can 
be readily accounted for in terms of the age 
and educational level of the S. 

The importance of the sex of the respond- 
ent in cumshaw tolerance, a relationship that 
did not lend itself to meaningful correlational 
analysis, was also examined. A test of sig- 
nificance was made for the difference be- 
tween mean cumshaw tolerance scores for 
male Ss (M = 4.29, N = 68) and female Ss 
(M = 4.45, N = 33). The difference was not 
significant (¢ = .73). 


DIFFERENCES IN CUMSHAW TOLERANCES 
AMONG OCCUPATIONAL GROUPS 


The cumshaw-tolerance data were also 
examined in order to determine whether there 
were significant differences between the atti- 
tudes of the occupational subgroups. An 
analysis of variance was performed on the 
nine-item cumshaw tolerance scale scores 
across the four occupational subgroups. The 
groups did not differ significantly in terms 


of their attitudes toward cumshaw in general 
(F = 2.44). 

An attempt was made to devise special 
scales from the original pool of items for two 
of the occupational groups (clerical and 
trades) in order to compare the tolerance of 
an occupational group for cumshaw activity 
within their own areas of work and their 
tolerance for the cumshaw activities of the 
other occupational groups. Two short Gutt- 
man scales, a five-item clerical scale and a 
three-item trades scale, were constructed. 
Because of the small number of items in 
these scales and the consequent likelihood of 
their unreliability, it is obvious that any 
relationship among the data can be considered 
only suggestive of possible trends. It did 
appear that there was some support of the 
hypothesis for groups favoring their ‘own 
kind of cumshaw” over the cumshaw prac- 
ticed by other occupational groups, particu- 
larly in the clerical area. Additional re- 
search would be necessary to support such 
a conclusion, however. 


DiIscuUSsSION 


The fact that it was possible to develop 
an acceptable Guttman scale for cumshaw 
tolerance is of some interest in itself. Since 
the attitude-inventory items were able to meet 
the requirements of a Guttman scale, one 
may assume that tolerance-intolerance toward 
cumshaw behavior is a single variable rather 
than a composite of several variables. 

The present scale, of course, merely de- 
scribes an individual’s opinions as to the 
acceptability of cumshaw behavior. Valida- 
tion of the scale against external criteria 
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has not been attempted. It is also clear that 
the scale ratings could be easily faked; there- 
fore, the scaie would not be particularly 
useful in any type of employee-screening 
program, evaluation of training program, etc. 
Further, the relationship between an_indi- 
vidual’s expressed tolerance of cumshaw be- 
havior and the extent of his own participa- 
tion in such activity is also unknown. 

Even though the present version of the 
cumshaw tolerance scale has very limited 
value for applied uses, the scale does show 
that stated tolerance for cumshaw behavior 
is related to certain social-group variables 
(i.e., age and educational groupings). Initial 
hypotheses about the existence of such social 
group differences were thus confirmed by the 
results of the present study. 

Not all of the hypotheses about social 
group variables were supported by the data, 
however. It was expected that because of a 
lack of shared standards across group line, 
each occupational group would tend to be 
more tolerant of the cumshaw activities within 
their own areas of work than of the cumshaw 
activities of the other occupational groups. As 
was reported above, there were indications in 
the data that the foregoing hypothesis may 
be valid, but the data are inconclusive 
because of the small size of the special 
occupational scales which could be developed 
at this time. 

The relative status of the occupational 
group was also hypothesized as an environ- 
mental factor influencing the cumshaw toler- 
ance level of the group (see also Dalton, 
1959), with the higher status occupational 
groups expecting more privilege and inter- 
preting regulations less strictly. The results 
of the study did not support the hypotheses, 
however, since the analysis of variance for 
the cumshaw-tolerance-scale scores across 
occupational groups showed no significant 
variation from group to group. 

The psychological-trait hypotheses that 
were examined were not verified by the data. 
The hypothesis that authoritarian character- 
istics might relate negatively with cumshaw 
tolerance was ruled out by the data. Intoler- 
ance of ambiguity, hypothesized as another 
possible negative correlate of cumshaw toler- 
ance, also was not supported by the data. 
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The age and education variables readily ac- 
counted for any relationship that did exist 
between the cumshaw tolerance criterion and 
the measures of authoritarianism and intoler- 
ance of ambiguity. 

The lack of significant relationship between 
the self-participation ratings and the other 
variables presented in Table 1 is of interest. 
In the case of age and education, for ex- 
ample, one may assume (at least to the 
extent that the respondents were honest in 
their self-evaluations) that the younger, 
better-educated employees are not more likely 
to participate in cumshaw activity. That is, 
expressed tolerance toward cumshaw does not 
necessarily imply extensive participation in 
such activity. 

Other-participation ratings also were not 
significantly related to the age, educational 
level, or cumshaw tolerance variables, which 
could be interpreted as further evidence of a 
separation between expressed attitudes and 
evaluations of behavior. 

It should be noted that the variables 
examined in the present study accounted for 
little more than 10% of the variance for 
cumshaw tolerance. (The multiple correlation 
for F Scale, age, education level, and intoler- 
ance of ambiguity as predictors of cumshaw 
tolerance was .337, when corrected for shrink- 
age.) Thus, the dynamics underlying cumshaw 
tolerance remain largely undefined. 
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A study was conducted using 156 airmen at their base to test the hypotheses 
that wit and creativity are positively 
creativity are negatively correlated; and that wits are effective leaders. The 
effects of sarcastic versus nonsarcastic wit were explored. The 1st 2 hypotheses 
were supported. Wits were not effective leaders but were associated with less 
defensiveness and more effective group problem-solving. Most of the positive re- 
lationships with wit were found, more specifically, to be associated with sar- 


castic wit. 


Wit and creativity have been linked by a 
number of writers. Koestler (1961), for ex- 
ample, called a successful witticism a creative 
act, and Getzels and Jackson (1962) found 
that children who were highly creative but 
only moderately high in IQ valued and used 
humor more than those high in IQ alone. 
Torrance (1963b) has inferred that clown- 
ing, or humor, is one of several effective 
adaptive techniques which the creative person 
uses to remain in groups, or possibly to fend- 
off, to some degree, group pressures toward 
conformity. 

The creative person is generally reported 
to be characteristically active (Barron, 1957; 
Buel & Bachner, 1961) with some behaviors 
desirable and some undesirable (Torrance, 
1963a). It would seem then that he would 
be likely to make many contributions to a 
group, so that he might not be seen as a 
resister, but possibly as a leader. If he were 
also successfully witty, receiving social re- 
inforcement in the form of laughter or ac- 
ceptance, he might be less defensive than 
others in the group situation and feel freer 
to make suggestions. 

Prior studies regarding the effect of wits 
in groups (Goodchilds & Smith, in press; 
Smith & Goodchilds, 1959) have tended to 
show the wit in a positive light. In these 
studies, however, wits were identified by 
observing the number of successful witticisms 
during a rather short group discussion period 
(a post hoc method) plus obtaining self- 


1This work was supported by the Behavioral 
Science Division, the Air Force Office of Scientific 
Research of the Office of Aerospace Research, under 
Contract No. AF 49(638)-1216. 
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correlated; that defensiveness and 


reports of appreciation of wit. Groups which 
contained these wits generally were more 
efficient in solving a group arithmetic 
problem, reported less defensiveness in the 
situation, and were higher in group satisfac- 
tion than groups not containing wits or in 
which there was less joking. 

These results implied that the wit was 
operating as an effective leader in the group, 
not just as a person resisting group pressures 
toward conformity. Just as Torrance (1963b) 
has found in creative children, however, the 
wits were also quite likely to participate more 
highly than nonwits. 

In view of the need for experimental veri- 
fication of the results of these observational 
studies an attempt was made to preidentify 
two types of possible leaders, a wit and an 
influence agent, on the basis of sociometric 
techniques, and to compare the effectiveness 
of each in persuading his group to accept 
the correct answer to a problem. It was as- 
sumed that both types of appointed leaders 
would participate highly, and that the wits 
would. be more likely to use humor as a 
persuasive technique than would the influence 
agents. 

In addition, an attempt was made to 
evaluate the relative effectiveness of the two 
types of leaders under conditions of stress 
and nonstress. 

It was hypothesized: (a) that creativity 
and wit would be positively correlated, (0) 
that creativity and defensiveness would be 
negatively correlated, (c) that the wit would 
be an effective leader, and (d) that the wit 
would be even more effective under condition 
of stress. 
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The study was also designed to obtain 
information on the effects of humor by humor 
type, ie., sarcastic and nonsarcastic, but 
without prediction as to these effects. 


METHOD 
Subjects 


The subjects (Ss) were 156 Air Force personnel at 
Norton Air Force Base, California, recruited in 
groups of 6. The only selection requirement was 
that they know each other. The groups came from 
various phases of base operations. The sample 
included 20 Negroes. 


Procedure 


The men were seated around a table with number 
cards in front of them. Those in Conditions A and 
B, the stress conditions, were told that the experi- 
menters (Hs) were psychologists hired by the Air 
Force to evaluate them. They were instructed to 
put their name, rank, and serial number on all 
forms and were told that results would be entered in 
their personnel records. In the C and D, or non- 
stress conditions, the men were told not to use 
their names, but the numbers on the cards before 
them; further, they were told that the purpose of 
the experiment was to evaluate some tests, not the 
men themselves. 

On the basis of sociometrics, a wit or an influence 
agent was selected for each group. In Conditions 
A and C the wit was called aside, told he was selected 
wit, given the correct answer to the problem, and 
instructed to influence the group to give the right 
answer in any way that he could, joking or serious. 
In Conditions B and D the influence agent was 
called aside, told that he had been selected as such, 
given the right answer, and instructed to try to 
influence the group in any way that he could to give 
the right answer. In all conditions the rest of the 
group was led to believe that the person was being 
given instruction in the use of a group-recorder 
form. All confederates were instructed not to reveal 
the fact that they were confederates or that they 
had been told the correct answer. 

During a problem-solving discussion two ob- 
servers, the E and an assistant, recorded verbatim 
any witticism, defined as any statement which 
elicited audible laughter from at least two other 
group members. These witticisms were reproduced 
in large print for later analysis by the group mem- 
bers. The observers also rated the Ss on participation. 


Measures 





Sociometrics. (a) “Number frequently says 
witty and amusing things to make us laugh.” (b) 
“Number is the man in this group who is best 
at winning an argument.” One person was to be 
chosen, not themselves, and not the same man twice. 

Maier Horse Trading Problem (Maier & Solem, 
1952). Groups were given 8 minutes to agree on an 
answer to the following problem: “A man bought a 





Ewart E. SMITH AND HELEN L. WHITE 


horse for $60 and sold it for $70. Then he bought it 
back for $80 and again sold it for $90. How much 
money does he make in the horse business?” Answers 
to this problem generally range from $10 to $30, 
with $10 (incorrect) the modal answer. Answers 
were obtained from individuals both before and 
after the group discussion. 

Group Satisfaction Scale (Smith, 1957). This is a 
15-item Likert scale, on which a typical item is, 
“As far as I’m concerned, this was a nearly perfect 
group.” 

Defensiveness Scale (Smith, 1957). This is a 24- 
item Likert-type scale, on which a typical item is, 
“T did not feel free to say how I really felt.” 

Creativity. The measure used was the Word As- 
sociation Test from Getzels and Jackson (1962), in 
which Ss list as many different meanings for each 
of 25 words as they can. Only legitimate, non- 
redundant answers are scored. Although Getzels and 
Jackson report a correlation of only +.37 between 
this test and Binet IQs in gifted children, Ripple and 
May (1962) have shown that higher correlations are 
generally obtained between creativity and IQ with 
more heterogeneous samples. 

Humor analysis. The Ss indicated, for each witti- 
cism which occurred during the discussion, whether it 
was sarcastic or just for fun, whether it was con- 
cerned with the horse problem and whether it did or 
did not help the group on the problem, whether it 
was concerned with the way the group got along to- 
gether and whether it did or did not help them get 
along, and whether or not S felt the witticism was 
directed at him; if the last was checked, S was asked 
to indicate whether or not he resented the witticism. 


RESULTS 


As hypothesized, creativity and the wit 
sociometric were significantly correlated, 7 = 
+.17 (p < .05).* The leader sociometric and 
creativity were not significantly correlated 
(r = + .03). Instead, this seemed to be a 
function of rank, with sergeants chosen sig- 
nificantly more often than airmen (Mann- 
Whitney U of 2.59, p < .01). This compari- 
son was only made in the stress conditions 
where personal data, e.g., rank, were obtained. 
The wit sociometric was also correlated sig- 
nificantly (r = + .16, p < .05) with the joke 
tally. A correlation of +.14 between the joke 
tally and creativity was not significant. When 
the two wit indices, the joke tally and the 
wit sociometric, were combined, this new wit 
measure was correlated with creativity at 
+.18 (p< .05), yielding no significant in- 
crease in level of prediction over that given by 
the wit sociometric alone, This correlation 


2 All statistical tests are two-tailed. 
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might have been somewhat higher but for a 
tendency (p< .20) for sarcastic wits, who 
were higher in creativity than all others, to 
be nominated as wit less frequently than non- 
sarcastic wits. 

Those 11 wits (excluding confederates and 
Negroes) whose witticisms were seen as 
sarcastic were higher on creativity than either 
‘the 20 nonsarcastic wits (tf = 2.19, p < .05), 
or 82 nonwits (¢ = 2.16, p< .05). There 
were no significant differences between non- 
sarcastic wits and nonwits on creativity. 

The defensiveness and creativity measures 
were negatively correlated (7 = —.24, p< 
01). There was some indication that this 
correlation might be somewhat “inflated” by 
the inclusion of the Negro Ss as they were 
higher than whites on defensiveness (¢ = 
2.98, p < .01) and lower on creativity (¢ = 
4.00,  < .01). On whites only, the correlation 
dropped to —.17, still significant at the .05 
level. 

Data for 20 groups, five Ss per condition 
(excluding the confederates), were analyzed 
for purposes of comparing the effectiveness of 
wits versus influence agents in improving 
performance on the horse problem. Six groups 
were excluded from this analysis because the 
confederate revealed that he had been given 
the correct answer or did not try to influence 
the group, or advocated an incorrect answer. 

The stress and nonstress conditions did not 
differ on defensiveness, indicating that the 
stress manipulation was weak. However, 
there was less joking in the stress than in 
the nonstress conditions (y” = 9.26, p < .01). 
There was also more joking in the wit groups 
than in the influence groups (y* = 5.49, p 
< 05): 

The only experimental condition showing 
significant improvement on the horse prob- 
lem was the influence agent-stress condition 
(McNemar’s test, x? = 5.89, p< .05). The 
wit confederates were apparently ineffective in 
persuading members of their groups to accept 
the correct answer. 

Not all of the wit confederates were suc- 
cessful, of course, at being witty during 
the time available, and witticisms naturally 
occurred in groups in which influence agents 
were used as confederates; in fact, the num- 
bers of wits and influence agents joking were 
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not significantly different. Asking a wit to be 
humorous may lower his chances, or pos- 
sibly he does not generally use wit in persua- 
sion. When the data were analyzed in terms of 
whether or not witticisms occurred in a group 
and whether or not they were seen by mem- 
bers of the group as sarcastic, significant dif- 
ferences were found in both defensiveness 
and in improvement on the horse problem. 

On the defensiveness scale, wit groups were 
significantly lower than nonwit groups (t= 
2.04, p < .05), wit groups being those above 
the group mean of 2.76 witticisms. Groups 
containing sarcastic wits versus those con- 
taining nonsarcastic wits were not signifi- 
cantly different on defensiveness. 

On the horse problem, these wit groups 
improved significantly (McNemar’s test, x? 
= 4.92, p< .05), whereas groups lower in 
the number of witticisms did not. Further 
analysis revealed that this improvement was 
significant only in those groups in which the 
witticisms were perceived as sarcastic (bi- 
nomial, p < .05). 

As the sarcastic wits themselves were higher 
on creativity scores, therefore probably some- 
what higher in intelligence, it was felt that 
possibly they had the correct answer and 
were bringing the groups around to their 
point of view. On examination of the data, 
however, this did not seem to be the case. 
Only 2 of the 13 sarcastic wits were con- 
federates, 1 a wit and 1 an influence agent. 
Of the remaining 11, 8 had the wrong an- 
swer; only 1 changed his answer (from 1 
wrong answer to another). Group improve- 
ment therefore did not indicate improvement 
in the sarcastic wits themselves. It seemed in- 
stead that a sarcastic witticism was some- 
thing of a signal that the person making it 
was not going to change his answer. This 
might have tended to increase opposition to 
that person’s wrong answer. 


DIscusSION 


The correlation between wit and creativity 
supports the first hypothesis, and Koestler’s 
(1961) description of a witticism as a “crea- 
tive act.” That this variance is largely ac- 
counted for by the sarcastic wit stems no 
doubt from the greater complexity of sarcasm 
—a joke with a message. 
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The negative correlation between defensive- 
ness and creativity is most interesting, but 
probably explains what limits creativity, 
rather than what produces it. 

The relative ineffectiveness of the wit con- 
federates in influencing their groups raises 
several questions. It is possible that in not 
allowing members of the group to choose a 
particular person as both “saying witty 
things” and “best at winning an argument” 
the most effective leader might have been 
overlooked. The influence agents, after all, did 
not make significantly fewer jokes than the 
wits. 

The fact that there was less joking under 
stress suggests that lack of joking may be 
useful as an indication of stress. 

The findings regarding the positive effect 
of sarcastic wits on group performance were 
somewhat surprising, although they are con- 
sistent with a previous study (Smith & Good- 
childs, 1959) on the positive role of the 
sarcastic wit in small groups. The sarcastic 
wit’s tendency not to change his answer may 
indicate he thought he was right, and was 
using sarcasm derisively, as a type of nega- 
tive feedback. The probability of the group 
adopting an answer other than their own, 
however, in this case usually the correct an- 
swer, seemed to increase instead of decrease 
with their sarcasm. This possibility demands 
further research for clarification, but if it 
should turn out to be the case a theory of 
humor recently advanced by Fry (1963), 
which assumes that the wit is attempting to 
establish superiority and that laughing in- 
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dicates surrender in the laughter, would not 
be supported. 
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3 studies are reported which assess the validity of subtests of the Modern 
Language Aptitude Test (MLAT) modified for use with blind students of a 
foreign language. 2 studies involved blind students preselected with the test 
to study Russian at Georgetown University. Many of the validity coefficients 
were not significant, however, the distributions were so curtailed, that it is 
probable that the tests would be useful predictors in situations where students 
were not selected. Results of the other study, conducted with blind high school 
students, supported these results. 3 subtests, Words in Sentences, Number 
Learning, and Spelling Clues, were generally the best predictors of achievement. 


In the fall of 1960, an intensive 2-year 
program was initiated at Georgetown Uni- 
versity to train as Russian language trans- 
lators blind students who had had little or 
no previous second-language training, This 
report summarizes the results obtained in 
three studies attempting to measure language 
aptitude of blind students, using a modifica- 
tion of the Modern Language Aptitude Test 
(MLAT; Carroll & Sapon, 1959). 

Developed after a series of factor analytic 
studies of verbal ability measures (Carroll, 
1958), the MLAT consists of five subtests 
which measure different abilities necessary 
for achievement in a second language. These 
subtests were modified to avoid the use of 
printed materials and of written responses. In 
attempting such modifications, one risks 
changing basic characteristics of the tests so 
much that abilities different from those 
measured by the original tests are measured in 
the modified versions. It should be noted 
therefore, that although many of these tests 
have been given the same names as the 
original MLAT subtests (and generally make 
use of the same material), it cannot be 
assumed that they necessarily measure the 
same functions. 


1 This research was supported by grants to L. E. 
Dostert, Georgetown University from the United 
States Department of Health, Education, and Wel- 
fare, Office of Vocational Rehabilitation, Division of 
Services to the Blind, and was done in consulta- 
tion with J. B. Carroll, Harvard University. The 
author expresses his appreciation to S. Armstrong, 
Principal of the Ontario School for the Blind, 
Brantford, Ontario for his cooperation, and to 
J. B. Carroll and the Psychological Corporation for 
their permission to modify the MLAT subtests. 


Since many blind students are not skilled 
in Braille, the modifications were made in 
most instances without recourse to it. Instead, 
dot-answer sheets were used for the students 
to record their answers and test items were 
presented auditorily. The typical dot-answer 
sheet consists of a series of rows of raised 
(Braillelike) dots. For a five-response multi- 
ple-choice question, five such dots are pre- 
sented in a row at $-inch intervals. The sub- 
jects (Ss) are instructed to depress the first 
dot if they believe the first alternative is 
correct, and so on. The use of the dot-answer 
sheet requires no special apparatus other than 
a stylus or pencil for depressing the dots. 
Having Ss use a pencil facilitates scoring for 
the sighted investigator who need only count 
the number of correct dots blackened. 


Stupy I 


This study was conducted in 1960 to select 
15 blind students who were likely to be highly 
successful in the language program. In the 
absence of any measures of language aptitude 
standardized on blind students, and with in- 
sufficient time to perform even a token stand- 
ardization or validation of new measures, 
three subtests of the MLAT were modified. 
Since the Spelling Clues test is not readily 
modifiable without recourse to Grade I Braille, 
which is not easily read by many blind 
students, a Vocabulary test was substituted 
for it, because of the factorial similarity of 
these two tests (Carroll, 1958). Similarly, the 
Phonetic Script test is difficult to modify since 
it uses many symbols which cannot easily be 
approximated in Braille. In its place, a Pho- 
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netic Discrimination test was used to measure 
one aspect of the Phonetic Script test— 
namely, the ability to discriminate between 
slightly different speech sounds. 


Method 


Of the 200 blind students who applied to take 
part in the Russian language program, 57 were 
selected by the Office of Vocational Rehabilitation 
to be tested with the MLAT for the Blind. 

The five tests were tape-recorded so that instruc- 
tions, test items, and time limits were administered 
identically to each of seven groups tested throughout 
the country. The tests were: 

Number Learning. In the MLAT version of this 
test, the Ss are required to learn an artificial number- 
language system which is presented auditorily, and 
retention is assessed by having Ss write the arabic 
notation of a number spoken in the new language. 
Since blind students typically cannot write, the 
dot-answer sheet was modified to allow Ss to record 
their answers. Fifteen dots were presented in each 
row in three groups of five. While learning the 
actual language system, Ss obtained practice in 
depressing one of five dots on the extreme left to 
indicate a hundreds number, one of the five dots 
in the middle to represent a tens number, and one 
of the five dots on the right to represent a units 
number. Among the dots on the left hand side of 
the page, the first dot represented “no hundreds,” the 
second dot “one hundred,” etc. This same scheme 
also applied to the five dots representing the “‘tens,” 
and those representing the “units.” 

During the testing phase, 15 three digit numbers 
(e.g., 342) were presented in the new language at 
intervals of 8 seconds, The Ss received credit for each 
digit, including zeros, that they correctly indicated. 
The total possible score was 45; testing time was 
30 minutes. 

Words in Sentences. The MLAT Words in Sen- 
tences test consists of 45 items presented visually. 
For each item, a sentence is given in which one 
word or phrase is underlined. The student selects 
from five underlined alternatives in a second sen- 
tence, or sentences, the word or phrase which per- 
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forms the same function in its sentence, as does the 
underlined word in the first sentence. In the MLAT 
for the Blind, 36 items were presented auditorily. 
For each item, the first sentence was read, the 
underlined word was repeated, then the sentence 
was read again. The second sentence was then read, 
the five alternatives were read, and the sentence was 
repeated. On Braille booklets provided each S, the 
underlined word from the first sentence, and the five 
alternatives from the second sentence, were pre- 
sented in Grade 2 Braille. The S indicated his 
choice by drawing a line through the alternative 
he thought was correct. The total possible score was 
36; testing time was 35 minutes. 

Paired Associates. This test is presented visually 
in the MLAT. The student memorizes 24 “Kurdish”- 
English vocabulary pairs during a 4-minute period. 
Retention is measured by a multiple-choice test. In 
the MLAT for the Blind, all material was presented 
auditorily. The Ss heard each pair presented twice 
in close succession with a 10-second interval between 
different pairs. The entire list was presented once, 
following which the students were tested for reten- 
tion, using a “dot”-answer sheet. The total possible 
score was 24. Total testing time was 25 minutes. 

Vocabulary Test. Fifty items from the Co-opera- 
tive Vocabulary test, Form Q (Davis, Beers, Pater- 
son, & Willis, 1940) were presented auditorily. The 
Ss recorded their answers on dot-answer sheets. The 
total possible score was 50; testing time was 20 
minutes. 

Phonetic Discrimination. The Ss heard three spoken 
quasi-words adapted from several unfamiliar lan- 
guages, and indicated which of the two latter quasi- 
words was identical to the first by depressing one of 
two dots on their answer sheets. The total possible 
score was 15; testing time was 10 minutes. 


Results and Discussion 


Table 1 presents the matrix of Pearson 
product-moment correlation coefficients for 
the total sample as well as the means and 
standard deviations for each test. All but one 
of the coefficients meet standard significance 
levels, and it is probable that one factor would 


TABLE 1 


INTERCORRELATIONS OF APTITUDE MEASURES FOR THE ENTIRE 1960 SAMPLE 











Measures ‘| 





. Number learning 

. Words in sentences 

. Paired associates 

. Vocabulary 

. Phonetic discrimination 


ma Pwhde 


M 36.48 
SD 7.02 








2 3 4 5 
A8 AS 230) 39 
46 36 26 
50 41 
42 
20.23 14.54 35.80 12.65 
7.03 4.31 8.59 1.75 





Note,—N = 57, 
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TABLE 2 


CORRELATIONS OF APTITUDE MEASURES WITH CRITERIA 
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Aptitude measures 











5 

1 2 3 4 Phonetic 

Number Words in Paired Vocabu- discrim- 

Criteria (grades) learning sentences associates lary ination 
First semester 39 ee 16 — 24 —.22 
Second semester 24 A8* 36 00 — 19 
Third semester 07 oe 22 —.25 — 14 
Fourth semester —.13 Sul 30 —.10 — 24 
Fifth semester .06 noo) 34 —.18 — 38 
Mean grade mii 44* 30 —.17 — .26 
M 42.33 26.27 21.33 39.73 13.60 
SD 2.67 4.68 1.40 5.97 95 





Note.—Selected 1960 sample N = 15. 
mp <..10, 
ED <.05. 


account for much of the variation described in 
this matrix. Since it has already been demon- 
strated that the MLAT subtests share con- 
siderable variance in common (Carroll, 1958; 
Gardner & Lambert, 1963) it is difficult to 
determine in the present case whether this 
common element is due to language aptitude, 
or whether it is caused by an artifact produced 
by the modifications. Unlike the original 
MLAT, each of these tests is highly dependent 
upon immediate memory; however, it is prob- 
able that because of the type of instruction 
necessary with blind students, the increased 
reliance of the tests on memory skills would 
improve their validity. 

It is clear that the Phonetic Discrimination 
test is too easy (X = 12.65), and does not 
discriminate among Ss (SD = 1.75). The 
means of the other tests are slightly high, 
reflecting the caution used in providing clear, 
repetitive instructions and lengthy time inter- 
vals, but the variability is sufficient to indicate 
that the tests differentiate the Ss. 

Fifteen Ss were selected for inclusion in the 
Russian program from those obtaining the 
highest algebraic sum of standard scores on 
each of the tests. Table 2 presents the correla- 
tions of each of the aptitude measures with 
the first five semester grades in Russian and 
the mean grade for the selected sample. The 
Words in Sentences test is significantly cor- 
related with the first-semester grade, and 


shows a tendency to be related to the second- 
semester grades as well as the mean grade. 
The two other subtests modified from the 
MLAT, the Number Learning and Paired As- 
sociates tests, are generally positively cor- 
related with the criteria though the coefficients 
are not significant. The Vocabulary and Pho- 
netic Discrimination tests, on the other hand, 
correlate negatively, but not significantly, with 
the criteria indicating that they do not con- 
tribute usefully to selection. 

It should be noted, of course, that all of 
these validity coefficients are artifactually low 
because the Ss were selected on the basis of 
the aptitude measures, and consequently the 
variance of scores on these measures is con- 
siderably reduced. 


Stupy II 


A second program was begun in 1962 to 
train additional blind students in Russian. As 
a result of the previous study, it was clear 
that the three modified MLAT subtests pre- 
dicted potential achievement but that they 
might be improved, and that the Phonetic 
Discrimination and Vocabulary tests should 
be replaced. Modifications of the remaining 
two MLAT subtests were undertaken, and in 
addition a Spelling test was developed. Study 
II is concerned with the results of a study to 
test the validity of this new battery of tests. 
The Ss were not candidates for the Russian 
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language program but were, instead, students 
enrolled in a high school for the blind. 


Method 


Thirty-six blind high school students at the School 
for the Blind, Brantford, Ontario, Canada, acted as 
Ss. The grade level in school with the number of 
boys and girls in each was, Grade 9 (7 boys, 4 girls) ; 
Grade 10 (13, 2); Grade 11 (3, 1); and Grade 12 
(2, 4). The following six tests were administered. 

Number Learning. Time limits for this test were 
decreased and the number of items were increased 
from 15 to 20. The total possible score was 60; 
testing time was 20 minutes. 

Words in Sentences. This test was identical with 
the previous version except that students recorded 
their answers on dot-answer sheets. 

Paired Associates. Instructions and time limits for 
this test were slightly different from the previous 
version and Punjabi words were used instead of 
“Kurdish.” 2 The total possible score was 23; testing 
time was 16 minutes. 

Spelling Clues. For each item on the MLAT 
Spelling Clues test one word is presented in dis- 
guised (phonetic) spelling, and Ss are required to 
choose which of five alternatives, also presented on 
the test booklet, is most similar in meaning to it. 
For each item in the blind version the letters in 
the disguised word were read, and 5 seconds Jater, 
the five alternatives were spoken. The Ss indicated 
their answers on dot-answer sheets. The total pos- 
sible score was 44; testing time was 25 minutes. 


2 It was felt that this alteration would not affect 
the validity of the test and was deemed necessary 
since many of the candidates for the Russian language 
course would have been tested in Study I. Discussions 
with students selected in that study indicated that 
they did in fact remember many of the “Kurdish” 
words, 
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Phonetic Memory. The form of the Phonetic 
Script test could not be retained, hence this modifi- 
cation stresses only one aspect of that test, viz., 
memory for speech sounds. The Ss heard four sets 
of four word sounds each. After the four sets were 
presented, one from each set was repeated, and the 
Ss were required to indicate its position in its set 
by drawing a line through the dot which cor- 
responded to that alternative. There were seven 
sets of four or a total of 28 items in the test. Testing 
time was 14 minutes. 

Spelling Test. Twenty-three items were presented. 
A word was read aloud, spelled (correctly or in- 
correctly) and pronounced again. The Ss indicated 
their answers on dot-answer sheets. The total test- 
ing time was 9 minutes. 

Additional measures obtained for each S were: 

Intelligence. Each student’s IQ was obtained from 
the school records. Information concerning the test 
used to measure IQ is not available. 

Language Achievement. Since only the Grade 9 
French grades were available for all students, this 
was adopted as the criterion. Because these grades 
were obtained from different teachers for the dif- 
ferent years, they were standardized within each 
grade level to remove the effects of different grading 
standards, and these standardized French grades 
were used as the criterion. 


Results and Discussion 


Table 3 presents the matrix of correlations, 
as well as the means and standard deviations 
for each test. Three tests, Words in Sentences, 
Phonetic Memory, and Number Learning, are 
significantly correlated with the criterion 
while one test, Spelling Clues, just fails to 
meet standard significance levels. That the 
Words in Sentences test again is highly re- 


TABLE 3 


INTERCORRELATIONS OF APTITUDE MEASURES, INTELLIGENCE, AND LANGUAGE ACHIEVEMENT 














1 2} 3 4 5) 6 i 8 
1. Number learning 34 19 poo ae 29 .28 07 ‘Oleg 
2. Words in sentences ae .28 AA — 02 230 Ae 
3. Paired associates 40** .30* 05 A1** SURO 
4. Spelling clues ml 53 *F* 36** 3308 
5. Phonetic memory .10 02 et eee 
6. Spelling test 09 wD 
7. Intelligence 22 
8. Language achievement 
M 37.11 19.67 13.72 25.94 11.42 15.78 109.61 .05 
SD 13.18 4.77 3.21 6.54 3.29 2.54 13.50 95 
Note.—Brantford, Ontario sample VN = 36, 
* > <.10. 
oP D5 705; 


RK 2 <— -01, 


Test FOR BLIND STUDENTS 


























139 
TABLE 4 
INTERCORRELATIONS, RELIABILITIES, MEANS, AND STANDARD DEVIATIONS OF THE 1962 SAMPLE 
1 z 3 a 5 6 7 
1. Number learning (.95) 194 Ie 43 43 Poi .26 
2. Words in sentences (.80) 34 66 39 53 40 
3. Paired associates (.81) 5) .20 oD 25 
4. Spelling clues (.90) 42 45 74 
5. Phonetic memory (.45) .26 34 
6. Spelling test (.62) 40 
7. Vocabulary (.83) 
M 44.91 19.80 15.93 31.96 12.04 16.69 23.42 
SD 11.34 6.01 4.20 7.35 3.20 2.91 6.73 
Note.—N = 55. 
lated to achievement strongly suggests that Method 


it is a consistent predictor of achievement. 
The Phonetic Memory test was not used in 
the first study, but its relation here suggests 
that measuring students’ memory for speech 
sounds could possibly aid in predicting second- 
language achievement, particularly in the 
earlier stages of traditional instruction. Al- 
though the Number Learning test was not 
highly related to the criteria in the first study, 
it is significantly correlated with the criterion 
in this study. One possible reason for these 
divergent results may be the greater dis- 
criminability of the most recent modification 
of this test. Whereas scores on the first version 
were lumped at the upper end of the distribu- 
tion (X = 36.48, SD = 7.02 for 45 items), 
scores on the latter version are more normally 
distributed (X = 37.11, SD = 13.18 for 60 
items). 

None of the remaining tests are related to 
language achievement, although four tests, 
Words in Sentences, Paired Associates, Spell- 
ing Clues, and Phonetic Memory, are signifi- 
cantly related to intelligence. The MLAT has 
been shown to be relatively independent of 
intelligence (Gardner & Lambert, 1963); 
however, many of the subtests of the MLAT 
for the Blind appear to share considerable 
variance in common with intelligence tests. 


Stupy III 


The purpose of Study III was to select 
candidates for the second program of Russian 
instruction for the blind. 


Candidates were first screened by the Office of 
Vocational Rehabilitation, and 55 candidates were 
selected and tested with the MLAT for the Blind. 
Seven tests were administered to the students. Six 
of these were identical to those administered in the 
second study. The additional test was: 

Vocabulary. Forty items, different from those used 
in Study I, were selected from the Cooperative 
Vocabulary test, Form Q (Davis et al., 1940), and 
presented auditorily to the students. The Ss heard an 
English word, followed immediately by five addi- 
tional English words. They indicated on dot-answer 
sheets the alternative that had the same meaning 
as the first word. Testing time was 14 minutes. 


Results and Discussion 


Table 4 presents the correlation matrix for 
the total sample, as well as the means and 
standard deviations for each test. The correla- 
tion coefficients in brackets along the major 
diagonal are split-half reliability coefficients 
corrected by the Spearman-Brown formula. 
These reliabilities are satisfactory for all 
measures except the Spelling, and Phonetic 
Memory tests. In both cases this lack of 
reliability appears to be partially due to the 
restricted variability of the scores (SD = 2.91 
and 3.20, respectively), while the high dif- 
ficulty level of the Phonetic Memory test 
(X = 12.04 for 28 items) undoubtedly en- 
couraged a considerable amount of guessing. 

Twenty-three students were selected for 
the Russian language program from those ob- 
taining the highest sum of standard scores on 
all but the Vocabularly test. Because of the 
consistent predictive power of the Words in 
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Sentences test, the standard scores on this 
test were doubled. Since very little informa- 
tion was available on one student’s grades, the 
validity coefficients to be discussed are based 
on a sample size of 22. 

Table 5 presents the correlations of the 
aptitude measures with the first, second, and 
average semester grade in Russian. 

The Words in Sentences test correlated sig- 
nificantly with the second-semester grades and 
with the mean grade, emphasizing the predic- 
tive power of this test. Although none of the 
other tests evidence significant validity coeffi- 
cients, the consistently positive correlations of 
the Number Learning, Spelling Clues, and 
Vocabulary tests suggest that they might be 
useful predictors in situations where the 
variability in scores is not so restricted. Again 
it must be noted that these correlations are 
probably underestimates since the sample was 
highly selected on the basis of the aptitude 
tests (cf., the standard deviations in Tables 4 
and 5). 


GENERAL DISCUSSION AND CONCLUSIONS 


Because of the nature of the different 
samples and the fact that the same tests were 
not used in each sample, direct comparisons 
of the results from the three studies are not 
possible. Nonetheless, it seems clear that the 
MLAT for the Blind measures a student’s 
probable degree of success in second-language 
study. In particular, the Words in Sentences 
test is sufficiently sensitive to predict achieve- 
ment even in situations where the samples are 
highly selected. Under these conditions (e.g., 
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Studies I and III), none of the remaining 
tests are significantly related to the criteria; 
however, the Ss were so highly selected, first 
by the Office of Vocational Rehabilitation on 
such factors as intelligence and academic 
achievement, and then on performance on 
the tests themselves, that it would be un- 
likely that particularly high validity coeffi- 
cients would be obtained. The consistently 
positive validity coefficients of the Number 
Learning and Spelling Clues tests suggest 
that they would be more useful predictors in 
situations where there is not as much pre- 
selection. The results of Study II, where the 
distributions were not so curtailed, tend to 
confirm this. In that study, the validity 
coefficients of the Number Learning and 
Words in Sentences tests were clearly signifi- 
cant, while that for the Spelling Clues test was 
significant at the 10% level. 

Of the remaining four tests used in Study 
III, two yielded inconsistent results. The 
Vocabulary test was positively related to the 
criteria in Study III, but negatively in 
Study I. Similarly, the Phonetic Memory test 
was significantly related to the criterion in 
Study II, but produced negative validity 
coefficients in Study III. Although it might 
be that the nature of the language courses 
taught to these samples was considerably dif- 
ferent and hence required different abilities, 
the relatively low reliability of this test argues 
against such speculation. 

Both the Paired Associates and Spelling 
tests consistently yield positive, but not sig- 
nificant, validity coefficients. Since the vari- 


TABLE 5 


CORRELATIONS OF THE APTITUDE MEASURES WITH CRITERIA 








Semester grades 








— Mean 

I II grade M SD 
Number learning ON ol <0 49.62 6.51 
Words in sentences 202 She 46* 23.09 4.09 
Paired associates 07 ws Sie 17.70 2.92 
Spelling clues i! Row 34 35.90 4.39 
Phonetic memory —.03 ell — .09 13.86 3.20 
Spelling test hid O1 07 17.91 1.95 
Vocabulary ro 32 So 26.45 5.95 





Note.—Selected 1962 sample N = 22. 
*p <.05. 
*¥ 7D) <nO2e 
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‘ability in scores on these tests was particularly 
‘low in all samples, prediction might be im- 
proved by increasing their discriminability. 
It is of particular interest to note that al- 
‘though spelling ability is generally not a 
‘predictor of second-language achievement 
‘among sighted students it may be with blind 
‘ones—possibly because it indicates in the 
‘latter case a particular interest in language 
and language structure. Better prediction 
might be obtained with blind students by also 
assessing differential achievement in language 
skills which though normally developed in 
the sighted student are not generally required 
by the blind. 
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AN APPLICATION OF PSYCHOLOGICAL SCALING METHODS 
TO CONTENT ANALYSIS: 


THE USE OF EMPIRICALLY DERIVED CRITERION WEIGHTS 
TO IMPROVE INTERCODER RELIABILITY * 


RALPH V. EXLINE anp BARBARA H. LONG 


Center for Research on Social Behavior, University of Delaware 


A method is described in which a psychological scaling technique is applied 
to the analysis of the contents of written messages in order to provide a more 
precise metric for such measurement. The attribute to be measured was the 
extent to which each message communicated an attempt on the part of the 
writer to control the group’s decision of procedures. 2 scales were developed, a 
logical scale comprised of 9 categories, and an empirical scale based on the 
application of Thurstone’s successive interval technique to a set of written 
messages. The empirical scale was found to have a higher reliability than the 
logical scale with untrained coders. Possible reasons for the superiority of the 
empirical scale were discussed, and suggestions made concerning its use in future 


research. 


This report describes a method in which a 
psychological scaling technique was applied to 
the analysis of written message contents in 
order to improve the reliability of measure- 
ment. Seltiz, Jahoda, Deutsch, and Cook 
(1959) distinguish two sources of unreliability 
when human coders are used in measurement 
—one stemming from the individuals doing 
the coding and the other residing in the sys- 
tem of categories itself. Of these sources of 
unreliability, the latter (that relating to the 
system of categories) is the more critical, for 
while individual unreliability may be some- 
what controlled by training coders to use com- 
mon criteria for the placement of items, 
there still remain problems concerned with 
the metric of the category system and the 
clear definition of the measuring points. With 
respect to clear definition, Goldstein (1959) 
has shown that the use of concrete examples 
for each point along a dimension helps to 
explain the measuring categories. We shall 
address ourselves, in this paper, to the metric 
of the category system. 


1 This study was supported by funds from Con- 
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their thanks to. Stanley Tabasso, David Messick, and 
Charles Thornton who helped in the collection and 
analyses of the data. We also wish to thank Sarah 
Straughn, Charlotte Prickett, Janet Booker, James 
Driscoll, Edward Thompson, and Clark Eldridge, 
without whose careful work in coding the messages, 
the experiment could not have been performed. 
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When a system of categories represents 
varying quantities of a single quality, the 
intervals between the category points may not 
correspond to distances on the underlying 
continuum of the quality being measured. 
This is true even if categories are set up in a 
logical manner. The above problem is similar 
to that which arises in the measurement of 
attitudes; therefore we reason that the 
methods of psychological scaling should pro- 
vide a more precise metric with which to 
construct such a system of categories. 

To exemplify the application of such 
methods, we will describe how we used 
Thurstone’s succesive interval technique to 
devise a scale to be used in analyzing the 
content of written messages. This scale is 
designated the empirical scale. In addition, 
we will compare the reliability of coding 
achieved with this scale to the reliability 
achieved by using a logical system of estab- 
lished categories and weights. Thus the present 
report will include (a) a description of both 
logical and empirical scales and (0) a report 
and a comparison of their reliability. 


MrtHop 


Development of the Scales 


The Experimental Setting. The material to be 
analyzed consisted of messages written by subjects 
(Ss) in an experiment in which Ss were required to 
reach a group decision by communicating solely 
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TABLE 1 
THE LOGICAL SCALE 


Category 


Example 





Imperative statement 
Implicit imperative statement 
Giving a strong suggestion 
Giving a moderate suggestion 


5. Giving a weak suggestion 


6. Request for an opinion 
7. Weak request for a suggestion 


8. Moderate request for a suggestion 


9. Strong request for a suggestion 


We will build the house. 

I am going to build the house. 

We should build the house. 

I think we should build the house. (or) 
Our best bet is to build the house. 

Don’t you think the house would be the 
best to build? 

Which design do you like? 

What design do you think would be 
best to build? 

Which design do you think we should 
build? 

Which design should we build? 





through the medium of written messages (Exline, 
1962a). A content analysis of these messages was 
undertaken in order to determine the extent to 
which each writer attempted to control the group’s 
decision or, conversely, desired to be controlled by 
the other members of the group. For this analysis 
two scales, one logical and one empirical, were 
developed. The experimental setting which produced 
the data from which the empirical scale was de- 
veloped was, in relevant respects, identical with 
the setting in which the scale was used (see Exline, 
1962a, 1962b). 

The Logical Scale. This scale was constructed 
deductively. It is reproduced in Table 1 and is a 
simple and logical system of nine categories ranging 
from most indicative of desiring to control (Category 
1) to most indicative of desiring to be controlled 
(Category 9). For each category one or more 
examples of messages representing the category de- 
scription were provided. In addition, conventions 
for coding difficult or ambiguous messages were 
worked out, and examples of these given. The pro- 
cedure was similar to that of Goldstein (1959), who 
found that the use of such criteria significantly in- 
creased the reliability of coding. 

The Empirical Scale. Thurstone’s attitude-scale 
techniques (Thurstone & Chave, 1929) were used to 
derive this scale. The scale was derived from 90 
Tnessages written by Ss in an experiment (Exline, 
1962b), in which the Ss were required to reach a 
group decision by communicating only via written 
messages. Only the first message written by an S 
was used, and no S received a message before he 
wrote one. Only those messages which referred to 
the decision or procedures of the graup were used 
to derive the empirical scale. 

Each message was printed on a small piece of 
paper; these were then combined into a pack of 
90, ordered according to random procedures, but 
with the same order holding for all packs. One 
message pack was given to each of 126 male and 
female students from the University of Delaware. 
The students were instructed to sort the messages 


into 11 piles along a continuum varying from the 
most controlling to the most desirous of control. To 
facilitate sorting, and to clarify the nature of the 
continuum, each S was given 11 numbered cards, 
each containing a short description of the particular 
category. For example, the first card read: “The 
message indicates that the writer has an EX- 
TREMELY STRONG desire to control the re- 
sponses of others to his message.” Cards 2 to 5 
varied the modifiers of “desire” in the following 
way: very strong (2), strong (3), moderately 
strong (4), and weak (5). Card 6 was neutral with 
regard to desire for control, and Cards 7 to 11 
indicated increasing degrees of “allowing others to 
respond freely” to the message. In order to make 
messages more meaningful to the persons doing the 
sorting, the experimenter (£) described the ex- 
perimental situation in which the messages were 
written. 

Before the data from these sortings were analyzed, 
the sorts of 16 students who placed more than 25 


TABLE 2 


EXAMPLES OF MESSAGES FROM THE 
WoMEN’s EmprricAL SCALE 











Scale 

Message value 

Build a house BLL 
Let’s do the dog 49 


Suggest construction of a house. It appears 92 
most stable and suggests a more useful 
object than the others 


I would like to build the house-shaped | 
structure 

I like the design resembling a house 1572 

I think the house is nice 2.18 

Which design do you think would be easiest? 3.29 

Which design? 3.70 

What shall we do now? 4.49 
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messages on a single card were eliminated from the 
sample as per Thurstone’s recommendation. The 
successive interval technique (Edwards, 1957) was 
applied to the remaining data in order to derive a 
scale value for each message. This was done sepa- 
rately for the 50 male and 60 female sorters. Mes- 
sages on which there was little agreement (less than 
50% of the choices in two categories, or 67% in 
three) were discarded. 

Examples of the remaining messages with their 
scale values are listed in Table 2. Fifty messages 
for the women and 51 messages for the men formed 
scales which ranged from .07 to 4.49 for the women, 
and from .03 to 4.62 for the men. The low value in 
each case indicates that the sorters perceived the 
message to communicate a greater desire to control 
the group’s decision or procedure. 


Use of the Scales 


The Logical Scale. Coders use the logical scale to 
categorize messages by finding the category descrip- 
tion that seems most appropriate and assigning the 
number of this category to the message. Each of the 
nine categories is described in such terms as “im- 
perative statement” or “a moderate suggestion”; 
therefore, the categorization process requires that 
the coder abstract from the message certain gram- 
matical or affective qualities. Since examples of 
typical messages are provided for each category, 
coders also compare the message to be coded with 
these examples. 

The Empirical Scale. In using the empirical scale 
to code new messages, coders find the message on 
the scale which they judge to be most similar to the 
message being analyzed. The scale value of the 
criterion message is then assigned to the new message. 
The empirical scales are thus “standard scales” in 
Guilford’s terms (1954), with the messages on the 
scales being used as standards for the coding of 
new messages. 


RESULTS 


The Equivalence of the Logical and Empirical 
Scales 


Before developing the empirical scale, three 
psychologists had used the logical scale to 
rate the messages that were later used to de- 
velop the empirical scale. Each coder rated 
the messages separately, after which they met 
to resolve their differences. The composite 
coding thus achieved was correlated with each 
of the empirical scales with the following 
results: logical versus men’s empirical—r = 
.901, logical versus women’s empirical—r = 
903. 

Next, two of the psychologists (C-1 and 
C-2) independently rated 104 messages ob- 
tained from a second experiment. In general, 
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the experimental conditions in this experiment 
(described in Exline, 1962a) were the same 
as those in the experiment from which the 
messages used to derive the empirical scales 
were obtained. These messages were rated 
twice by each coder, once using the logical 
scale, and once using the women’s empirical 
scale (since they were written by women). 
Comparing the ratings from the two scales by 
the same individual, we obtained the follow- 
ing results: C-1: logical versus empirical—r 
= .917, C-2: logical versus empirical—r = 
914. 

The second set of messages were also in- 
dependently rated by six relatively untrained 
coders, three men and three women. These 
coders were all college graduates, and were 
research assistants of the Center for Research 
on Social Behavior. All six coders rated the 
104 messages twice, once with each scale. 
The order of the use of scale was controlled, 
half of the Ss using the logical scale first, and 
half the empirical. The coders were briefly 
instructed in the use of the scales, but were 
given no practice. When the two ratings from 
each coder were correlated with each other, 
the results shown in Table 3 were obtained. 

The average ratings of the six coders with 
the logical scale correlated highly (r= + 
934) with their average ratings for the 
empirical scale. 

The relatively high degree of agreement 
between the two scales suggests that they are 
measuring the same attribute. The empirical 
scale thus contributes a consensual type of 
evidence about the validity of the logical scale. 


A Comparison of the Reliability of the Logical 
and Empirical Scales 


The high correlation of the two scales shows 
only that the two scales apparently measure 
the same quality, or that they are predictable, 
one from the other. They provide no direct 
evidence about the reliability of each scale. 
In order to compare the scales’ reliabilities, 
several analyses were carried out. 

First, a Pearson r.was computed to cor- 
relate the ratings of each coder with those of 
every other coder. This was done separately 
for each scale and for the two trained and 
the six untrained coders. In addition, each of 
the 30 r’s from the untrained coders was 
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TABLE 3 


LocicAL Versus EmprricAL SCALE FOR THE SIx 
UNTRAINED CODERS 





Coder r 


849 
892 
930 
195 
825 
.640 





AADaAaS 





transformed into a z score, and the difference 
between each set of two tested by the method 
reported in Guilford (1956). Students’ ¢ was 
used to test the overall difference between 
the two sets of z’s. The appropriateness of 
the use of these statistics may be questioned, 
however, since the 15 2’s from each scale are 
not independent, being derived from just six 
sets of codings. Also, as Guilford suggests 
(1956), the z test of differences in 7 is limited 
to the case where the correlations being com- 
pared arise from rather independent variables. 
Since the stimuli and the Ss are the same for 
each of the rating methods, these correlations 
are not independent. 

Peters and Van Voorhis (1940) state that 
tests of the difference between 7’s which 
ignore possible correlations between the two 
r’s will, given a true correlation, result in 
standard errors that will be too high. Thus, 
by ignoring this factor, one errs in a conserva- 
tive direction. 

In order to avoid some of these difficulties, 
however, and to arrive at an overall estimate 
of the reliability of each scale from each 
group of Ss, intraclass intercorrelations were 
also computed for each group and each scale. 
Fisher has shown (1954) that this statistic 
is somewhat more accurate than the Pearson r. 
In addition to evidence about the degree of 
proportionality between coders, intraclass in- 
tercorrelation has the advantage of providing 
evidence about the actual amount of agree- 
ment between coders. 

Through analysis of variance one can com- 
pute the intraclass intercorrelation in two 
ways (Ebel, 1951). In the first method, the 
mean square for coders is included in the 
error term and an index of agreement results. 
With the second method the between-coder 
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variance is left out of the error term, and the 
intraclass correlation is an index of propor- 
tionality (as is the Pearson r). Since both 
kinds of information seem to be relevant to 
the comparison of the reliability of the scales, 
the intraclass correlations were computed 
twice in each case—once using each method. 
Intraclass correlations were also computed 
for the average coding of the six untrained Ss. 

A final comparison of the reliability of the 
two scales was made by using Garner’s (1960) 
method of using variance ratios to estimate 
the amount of information transmitted by a 
system of categories. As Garner explained, 
the scale upon which the stimuli are more 
spread out will be the scale that is most dis- 
criminating, since the greater uncertainty 
among the stimuli will permit more informa- 
tion to be transmitted about the stimuli. Al- 
though different scale metrices do not permit 
a direct comparison of the spread or variance 
of codings on the two scales, Garner’s formula, 


[est U (R:S) ] 


(1— variance between units coded) 


1 
= —7 logs 





total variance 2 


permits the uncertainty of the response, given 
the stimuli [est U (R:S)] to be computed 
from the ratio of the between-unit variance 
and total variance. [Est U (R:S)], or the 
amount of information transmitted, is thus 
another way of expressing the reliability of 
the scales. 

Results of the Analyses. The results of the 
various correlations will be reported first for 
the psychologists (C-1 and C-2), who, since 
they had worked on the development of the 
scales, had had both experience and training 
in their use. The Pearson y and the intra- 
class r’s for each scale are shown in Table 4. 


TABLE 4 


RELIABILITY COEFFICIENTS FOR Two TRAINED CODERS 


Logical Empirical 





Correlation scale scale 
Pearson r 943 .939 
Intraclass ry (proportionality) 945 .938 
Intraclass r (agreement) .832 885 





Note.—C-1 versus C-2. 
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TABLE 5 
PEARSON 7 FOR EAcu Parr oF 
UNTRAINED CODERS 
Pair of Logical Empirical 
subjects scale scale 
AB 18 82 
AF SM Re 
BF 62 14, 
CF 43 Sie 
AC 63 in 
BGs ill 16 
BD 715 84 
CD 67 Ah) 
AD 19 85 
DF 95 ous 
DE .68 SOOn an 
BE A 82 
CE 62 ae 
EF 65 .69 
AE 4. SOR ae 
*p <.05. 
rep < 01. 
KD < 001. 


From these data it appears that experienced 
raters may use both scales with confidence. 
In addition, there is no appreciable difference 
in the intercoder reliability of the two scales 
with both the Pearson r and the proportional- 
ity version of the intraclass r. With the agree- 
ment version of the intraclass 7, however, the 
reliability of the empirical scale appears to 
be slightly higher. 

The results listed in Table 5, however, 
show that for the relatively inexperienced and 
untrained coders strikingly different results 
were obtained. 

For each pair, the Pearson 7 was higher for 
the empirical scale, with 7 of the 15 dif- 
ferences significant at the .05 level or better. 
When the Pearson 7’s are transformed to 2’s, 
the mean z for the empirical scale is signifi- 
cantly higher than that of the logical scale 
(¢ = 3.935, p < .001). The average r for the 
empirical scale is .785, while that for the 
logical scale is .675. Thus, the evidence from 
the Pearson 7’s seems to indicate that rela- 
tively untrained coders are significantly more 
in agreement with one another when they use 
the empirical scale as a basis for coding the 
messages. It should also be noted that the 
reliability coefficients for all the untrained 
coders are lower than those for the trained 
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psychologists, indicating the benefits of train- 
ing and experience with the scales. 

Both kinds of intraclass correlations for 
the untrained coders show the superiority of 
the empirical over the logical scale. The pro- 
portionality index for the logical scale is .657, 
while that for the empirical scale is .776. The 
agreement index derived from the empirical 
scale was strikingly higher than that derived 


from the logical scale (logical—r = .051, 
empirical—r = .559). 
Intraclass correlations for the average 


rating of the six coders showed similar results: 
proportionality index: empirical—r = .954, 
logical—r = .915. Agreement index: empirical 
—r = .884, logical—r = .243. Here again the 
empirical scale is distinctly superior. 

When intraclass 7’s are transformed to 2’s 
(Fisher, 1954), the difference between each 
pair (logical versus empirical) is significant at 
the .05 level or better. However, since both 
r’s in each set are based upon the ratings of 
the same messages, and since both ratings 
were done by the same coders, the appli- 
cability of the z test of differences may be 
questioned. 

This objection can be overcome, however, 
when an analysis of variance is applied to 
each of the two sets of data. When this is 
done, with the rows referring to messages 
coded, the columns to coders, and row by 
column interaction used as an error term 
(since there is but a single entry in each cell), 
the results shown in Table 6 clearly indicate 
the superiority of the empirical scale. 


TABLE 6 


ANALYSES OF VARIANCE FoR Eacu ScALE— 
Srx UNTRAINED CopERS 











df 5) MS F 
Empirical scale 
Rows (messages) 103 717.27 6.96 21.75* 
Columns (coders) 5 2.43 49 1.53 ns 
R X C (error) LOM aL OD 24 £2, 
Total 623 
Logical scale 
Rows (messages) 103 2,298.00 22.30 11.86* 
Columns (coders) 5 75.00 15.00 7.98% 
R X C (error) S159 967,000 1488 
Total 623 





*p— <.0005. 
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Table 6 indicates that for the empirical 
scale, the messages are significantly different 
from one another, and the coders are not, 
while for the logical scale, both coders and 
messages are significantly different from one 
another. 

One final comparison of the reliability of 
the two scales was made by computing the 
information transmitted by each. Here again 
the empirical scale appears to be superior—est 
U (R:S)= 1.2155 for the empirical scale, 
est U (R:S) = .8498 for the logical. In sum- 
mary, we can say that for inexperienced 
coders, all the measures used to compare the 
scales support our general hypothesis that 
psychological scaling techniques will permit 
data to be coded more reliably. 


Additional Findings 


A Comparison of Empirical Scale Values of 
Male and Female Sorts. Since the empirical 
scales were derived separately for the men 
and women students who acted as judges, a 
comparison of these two forms of the empirical 
scale is possible. This comparison is based on 
the 42 messages that the two scales have in 
common (8 messages were retained in the 
women’s scale but not in the men’s, and 9 
in the men’s scale but not in the women’s). 
Although the mean value of the 42 messages 
was .113 points lower for the women, indicat- 
ing that the absolute placement was lower 
(or more controlling), there was considerable 
agreement in regard to relative placement 
(Pearson 7 = .976). On the other hand, 31 of 
the 42 messages were coded as more control- 
ling by the women, 3 had less than .01 points 
difference, and 8 were seen as more con- 
trolling by the men. A ¢ test of the difference 
between male and female mean placement of 
messages was highly significant (¢ = 3.89, p 
> .001). Thus, women tended to judge the 
same message as indicating more desire to ex- 
ercise control over the group’s decision or 
procedures than did the men. 

Construct Validity of Empirical Scale. 
Ebel (1961) suggests that the “meaning” of a 
measure is revealed by the operations used to 
develop the test, the reliability of the test, 
and the relationship between the test and 
other variables. Preliminary evidence about 
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one such relationship has been found for 
the empirical scale. As reported by Exline 
(1962a), Ss, both men and women, having 
relatively low n affiliation as revealed by 
French’s Test (1955), were found to express 
a significantly higher desire for control in 
their first message to other members of the 
group, as measured by a content analysis 
making use of the empirical scale, than did 
Ss with high v affiliation. 


DiIscussION 


It is evident that all differences between 
the logical and empirical scales, whether from 
the Pearson correlations, the intraclass cor- 
relations or the information measure, favor 
the empirical scale. With untrained coders, 
the logical scale is clearly deficient, whereas 
the empirical scale exhibited adequate reli- 
ability. While these findings are limited to a 
single set of scales used for the analysis of 
just one attribute, and as such should not be 
interpreted as demonstrating the superiority 
of all empirical scales, it is interesting to 
speculate about possible reasons for the 
present results. Two factors may help to ex- 
plain the superiority of the latter scale. The 
first concerns the larger number of categories 
which comprise the empirical scale, and the 
second refers to the greater concreteness of 
the standards which it provides the coders. 

The number of categories comprising a 
scale is relevant to its reliability; Garner 
(1960), for example, found that rating scales 
up to and including 20 categories transmit 
more information than those with fewer cate- 
gories. Bendig (1954) and Bendig and Hughes 
(1953) obtained similar results with cate- 
gories up to and including 11. Although the 
empirical scale used in the present study ap- 
pears to be more continuous than does the 
logical scale, it may in fact be thought of as 
being composed of 50 categories as opposed 
to the 9 of the logical scale. In physical meas- 
urement, a finer scale produces more precise 
measurement (up to the capacity of the ob- 
server to make distinctions). A ruler marked 
every sixteenth of an inch, for example, will 
result in more accurate measurement than one 
marked every inch. Thus, the superior reli- 
ability of the empirical scale may be due 
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to the greater number of categories compris- 
ing it.” 

Let us next consider the effect of concrete 
coding criteria. It is likely that the empirical 
scale is more concrete than the logical. In 
place of abstract descriptions buttressed by a 
few examples for each category, the empirical 
scale provides only concrete standards for 
comparison. When the coder uses the logical 
scale, he must abstract certain grammatical 
and affective qualities from the message to 
be coded. These qualities must then be com- 
pared to the category descriptions in order 
to achieve the closest possible fit. In the 
empirical scale, on the other hand, the coder 
compares the message itself to the actual mes- 
sages of the standard. The variability of re- 
sponse may be reduced because the inter- 
mediate step of abstraction, for which each 
coder might use slightly different cues, is 
either eliminated or greatly reduced. The 
results of this study suggest that the more 
concrete process, characteristic of the empir- 
ical scale, can be carried out with more agree- 
ment between and among untrained coders. 
The fact that untrained coders agreed more 
among themselves as to the placement of an 
item in the empirical scale than in the logical 
scale suggests that standard scales in which 
concrete comparisons are made have certain 
advantages over the usual rating scale which 
requires the coder to make a greater number 
of inferences. 

The evident superiority of the empirical 
scale with untrained coders suggests that with 
a more precise scale, less training would be 
necessary to bring the coders to an adequate 
degree of proficiency. The savings in time and 
money would seem to be considerable because 
the results obtained by the psychologists 
identified as experienced coders were achieved 
only after many weeks of intensive work with 
both scales. Such savings of course must be 
balanced against the costs of scale construc- 
tion. The method would seem to be most 
economical when extensive coding is required, 
or when the basic stimulus situations can be 
used a number of times. 

Jn addition it is likely that the number of tied 
ranks is much less when more categories are used. 
A more powerful tool is thus constructed, because, 


with fewer tied scores, many more paired com- 
parisons are possible. 
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This study also demonstrates the advantages 
of the intraclass correlation and analysis of 
variance in studies of reliability. Not only is 
the computational labor considerably reduced 
when these methods are used, but the results 
are more easily interpreted, particularly those 
derived from the analysis of variance. 

Since the empirical scale seems to possess 
adequate reliability, it would seem to have 
merit as a tool in future research. Because the 
scale is used with the Ss who are in actual 
group decision-making situation, the method 
avoids the weaknesses of both self-report and 
projective devices for studying the desire for 
power. While it may be of interest to know 
how much a person says he dominates others 
in decision making, how much he says he 
would like to do so, or how much of such a 
need or desire is expresed in his stories or 
picture interpretations, the study of his actual 
dominance behavior seems to be of greater 
relevance than any of these. 

Of additional interest to the user of the 
empirically constructed scale is the possibility 
that the process of scaling itself may uncover 
interesting psychological phenomena. For ex- 
ample, men and women were observed to 
differ systematically in their placement of 
messages in the 11 piles representing the 
central dimension. The women to a slight, but 
consistent and significant degree, interpreted 
the same messages as being more controlling 
than did the men. The implications of this 
finding are provocative. Is it related to 
women’s role in our society? Do women per- 
ceive themselves as less influential than men? 
If so, should we interpret the finding to in- 
dicate that women, desiring more power, im- 
pute this motive to others? Do men who per- 
ceive themselves as relatively impotent in in- 
fluence situations show similar tendencies? 
The answers to these questions are not to be 
found in the data herein presented, but the 
existence of such a set of weighted messages 
opens up the possibility of empirical testing 
of explanatory hypotheses. 


SUMMARY 


This report describes a method in which a 
psychological scaling technique is applied to 
the analysis of the contents of written mes- 
sages in order to provide a more precise metric 
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for such measurement. The attribute to be 
measured was the extent to which each mes- 
sage communicated an attempt on the part 
of the writer to control the group’s decision or 
procedures. In order to rate messages on this 
continuum two scales were developed. The 
first of these was the logical scale described 
in terms of 9 categories for which examples 
were provided. The second scale was de- 
veloped empirically by having 126 students 
each sort 90 messages into 11 categories. The 
method of successive intervals was applied 
to these data and a scale value obtained for 
each message. Separate scales were developed 
for male and female Ss; only the messages 
on which there was most agreement were 
retained (50 in the men’s scale; 51 in the 
women’s). 

Two experienced and six untrained coders 
rated a second set of messages twice, using 
each scale once. High correlations resulted 
when each S was compared with himself, in- 
dicating a similarity between the two scales. 
When two psychologists, who were experi- 
enced and trained coders, used the scales, 
scale reliability did not differ whether meas- 
ured by the Pearson r, the proportionality 
form of the intraclass 7, or the agreement form 
of the intraclass r. In the case of untrained 
coders, however, the empirical scale was more 
reliable for each comparison. Analysis of vari- 
ance showed messages coded with the empir- 
ical scale to be significantly different from 
each other, while the raters were not signifi- 
cantly different. When the logical scale was 
used to code the messages, both raters and 
messages differed significantly. The amount 
of information transmitted was found to be 
greater for the empirical scale. Possible rea- 
sons for the superiority of the empirical scale 
were discussed, and suggestions were made 
concerning its use in future research. 
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GROUP PERFORMANCE AS A FUNCTION OF TASK 
DIFFICULTY AND THE GROUP’S AWARENESS 
OF MEMBER SATISFACTION * 


MARVIN E. SHAW anp J. MICHAEL BLUM 


University of Florida 


This experiment investigated the hypothesis that group effectiveness increases 
with increased member awareness of group satisfaction, and that this effect is 
greater for difficult than for easy tasks. 5-person groups attempted 3 tasks 
differing in difficulty, under 3 conditions of satisfaction feedback: no feedback, 
overt feedback, and covert feedback. In the overt condition, Ss publicly in- 
dicated their satisfaction with the problem-solving process, whereas in the 
covert condition their satisfaction was indicated anonymously. The results 
+ supported the hypothesis. It was suggested that valid communication of 
satisfaction leads to more complete use of members’ contributions, and hence 


improves performance. 


Tn a classical study of public versus private 
opinion, Schanck (1932) found that citizens 
of a community publicly expressed both strong 
disapproval of card playing and the conviction 
that all others in the community held the 
same opinion. In fact, however, card playing 
was common throughout the community, and 
relatively few citizens privately expressed 
disapproval of this activity. This lack of 
awareness of the true feelings of others in the 
community was referred to as “pluralistic 
ignorance.” This concept has been invoked 
more recently to describe a similar lack of 
awareness among members of small problem- 
solving groups (Bonner, 1959; Thibaut & 
Kelley, 1959). It is often the case that a 
group accepts a decision that only a few 
members believe to be the best one, each 
dissatisfied member thinking that he alone is 
displeased with the decision. This situation is 
assumed to interfere with effective group 
functioning. If so, increasing members’ aware- 
ness of the feelings of others in the group 
should increase group effectiveness. 


1 This research was supported by the Office of 
Naval Research, Contract NR 170-266, Nonr-580(11). 
A brief report of this study appeared in Psychonomic 
Science (Shaw & Blum, 1964). 


But pluralistic ignorance can arise only 
when there is a discrepancy between the 
publicly stated position of the group and the 
private position held by individual members. 
In the small decision-making group such a 
discrepancy is most likely to occur when the 
task is relatively difficult. When the task is 
easy, it is probable that the best (or correct) 
decision will be quickly determined and ac- 
cepted by all members, both publicly and 
privately. When the task is more difficult, 
however, the best decision is not so readily 
determined, and a decision may be proposed 
that has the support of only one or two 
persons in the group. If each member who 
disagrees believes that all others are in agree- 
ment (i.e., pluralistic ignorance exists), he 
may accept the decision rather than be cast 
into the role of a deviant. Such premature ac- 
ceptance of a privately unacceptable decision 
precludes further consideration, expression of 
new ideas, application of additional informa- 
tion, and so on. In general, less adequate de- 
cisions would seem to be an inevitable con- 
sequence. 

Since these considerations suggest that 
pluralistic ignorance has its greatest effect on 
group performance when the task is relatively 
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difficult, we would expect increased awareness 
of the feelings of others to improve perform- 
ance relatively more on difficult than on easy 
tasks. The purpose of the present experiment, 
then, was to test the hypothesis that group 
effectiveness improves with increased knowl- 
edge of member satisfaction, and that this 
effect is greater on difficult than on easy tasks. 


METHOD 


A mixed experimental design was used, involving 
three satisfaction-feedback conditions (no feedback, 
overt feedback, and covert feedback) and three 
degrees of task difficulty. 

Subjects. The subjects (Ss) in this experiment 
were 135 male undergraduates at a state university. 
Nine groups of five persons each were assigned 
randomly to each of the three feedback conditions. 
All groups were subjected to all task difficulty 
conditions. 

Apparatus. The apparatus consisted of a square 
panel of lights placed in the center of an oval work- 
table. The panel contained 5 rows of 5 lights each. 
A 5-position switch was located under the table at 
each S’s position. Each switch controlled 5 of the 25 
lights, and the wiring of the apparatus was such 
that a given switch setting could be connected with 
any light on the panel. Switch settings were cumula- 
tive so that moving the switch control to the first 
position turned on 1 light, moving it to the second 
position turned on an additional light, and so on, 
with the fifth position turning on all lights controlled 
by that switch. Thus each S could signal his satisfac- 
tion to the group by setting his switch at the first 
position to indicate minimum satisfaction, at the 
second position to indicate somewhat greater satis- 
faction, and so on, with the fifth position indicating 
maximum satisfaction with the group’s progress. 

This “moraleometer” was used to manipulate 
satisfaction feedback. By changing the color and 
arrangement of the lights on the panel, feedback 
could be “signed” (overt) or anonymous (covert). 
For the overt condition, each S’s switch was con- 
nected to lights in a single row and the color of 
his lights corresponded to his experimental “name.” 
For the covert condition, each S’s switch was con- 


TABLE 1 


CHARACTERISTICS OF THE Group TASKS 





Dimension Task 1 Task 2 Task 3 
Difficulty A 4.2 6.1 
Solution multiplicity 0.9 0.9 0.9 
Cooperation requirements 2.8 3.1 2.9 
Population familiarity 3.9 4.4 3.6 
Intrinsic interest 2.3 Ou 3.5 
Intellectual-manipulative 6.9 6.4 6.1 

requirements 
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nected to lights randomly distributed over the 
panel, and all lights were red. It was believed that 
both satisfaction-feedback conditions would reduce 
pluralistic ignorance, but that this reduction would 
be greater in the covert condition since Ss would 
be less influenced by others than in the overt con- 
dition, 

Tasks, The task-difficulty variable was manipulated 
through the selection of tasks assigned to the groups. 
The three tasks used had been scaled on six dimen- 
sions (Shaw, 1963), and were selected to vary 
primarily on the Difficulty dimension. The first 
problem required a group ranking of six United 
States cities in order of population, as indicated by 
the 1960 census. This task is similar to those used 
by Gaier and Bass (1955). The second task was 
essentially the change of work task formulated by 
Maier (1953), adapted for use with five-person 
groups. The third task was a puzzle problem con- 
cerning domitory room assignments, given certain 
limiting conditions. These tasks are described in 
full in a recent report (Shaw, 1963; Tasks 80, 23, 
and 97, respectively). The dimensional character- 
istics of each task are given in Table 1. 

Procedure. Groups were assembled 
worktable and instructed as follows: 


around the 


We are interested in observing how groups at- 
tempt to solve problems. On each of several trials 
you will be given a problem to solve working 
together as a group. After reading each problem, 
you will have 30 minutes to discuss it and arrive 
at a decision. When you have agreed upon a solu- 
tion the discussion will be terminated, At that time, 
please write your decision on one of the cards on 
the table and hand it to me. 


The remainder of the instructions varied with the 
feedback condition to which the group was as- 
signed. No-feedback groups were told only that the 
display box would not be used and should be 
ignored. For groups assigned to the feedback con- 
ditions the purpose and operation of the apparatus 
was carefully explained. Then groups in the overt 
feedback condition were told that the color of the 
lights corresponded to the group member’s name (i.e., 
that Red’s satisfaction would be indicated by red 
lights, Blue’s satisfaction by blue lights, etc.), 
whereas groups in the covert condition were told 
that the signaling of satisfaction was to be anony- 
mous and were cautioned not to make switch adjust- 
ments obvious. After the experimenter was sure that 
the instructions were understood, each group was 
given an 8-minute trial on a discussion problem to 
“warm up” and, for the feedback groups, to get 
accustomed to setting the switches. The first problem 
to be attempted was then given to the group, along 
with instructions specific to the task. The three tasks 
were given in different orders such that each task 
was attempted first, second, and third an equal 
number of times, matched across feedback conditions. 

At 5-minute intervals during the problem solving 
and at the end of each trial, E recorded the 
number of lights that were turned on by each group — 


Group PERFORMANCE 


TABLE 2 


MEAN Time Scorrs For TASK AND SATISFACTION 
FEEDBACK CONDITIONS 








Condition Task1 Task2 Task3 M 
No feedback Ved 21S 29.7 19.5 
Overt feedback 8.1 16.4 29.1 17.9 
Covert feedback Tek: 17.3 28.7 17.9 

M Teh 18.4 29.2 18.4 





member. When all three trials had been completed, 
Ss were given a number of bipolar scales designed 
to measure overall satisfaction, attitude toward the 
experiment, and S’s perceptions of his own behavior 
and the behavior of others in his group. 


RESULTS 


Time to solve is the only performance score 
that is directly comparable across tasks. 
Therefore, it is probably the best criterion 
for difficulty, although not necessarily the best 
criterion for group effectiveness. Mean time 
scores are given in Table 2. Both feedback 
conditions required less time than the control 
condition, but analysis of variance indicated 
that only differences attributable to task dif- 
ficulty were statistically reliable (F = 76.66, 
df = 2.30, p < .001). The mean time scores 
per task are very nearly linearly related to 
Difficulty scale values (see Table 1), thus 
providing further evidence regarding the 
validity of the scale values. 

The basic performance score for Task 1 
was the rank-order correlation between the 
group’s ranking of the cities and their true 
populations as given in the 1960 United States 
census report. The score for Task 2 was the 
reduction in time required for the work team 
to produce one unit by the procedure decided 
upon by the group as compared with the 
time required by the old operating procedure. 
Since the old time was 15 minutes per unit 
and the shortest possible time was 10 minutes 
per unit, scores range from zero to five. 
Task 3 required answers to four questions, 
and the performance score was the number 
of questions answered correctly. The means 
of these scores for the three tasks in each 
of the three feedback conditions are given 
in Table 3. Significance of differences among 
satisfaction-feedback conditions were tested 
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for each task separately by means of Wil- 
coxon’s (1945) T test. Performance scores 
were higher with feedback than with no feed- 
back and on Tasks 1 and 3 with covert 
than with overt feedback. The differences 
were significant, however, only for Task 3 
(p < .05). 

There were no statistically reliable dif- 
ferences among feedback conditions in mem- 
ber satisfaction as indicated by ratings of 
overall satisfaction or by number of lights 
turned on during the problem-solving inter- 
action. However, the average correlation be- 
tween ratings and number of lights was .31 
in the overt feedback condition and .50 in 
the covert feedback condition, suggesting that 
satisfaction feedback was more valid in the 
covert than in the overt condition. Further- 
more, the range of indicated satisfaction (by 
lights) was significantly less (p < .01) in the 
overt (1M = 1.9) than in the covert condition 
(M = 2.9). 

There was a tendency for Ss to evaluate 
their own behavior and contributions more 
favorably than that of others in the group, 
but differences were not statistically reliable. 


DiIscussION 


The data suggest that group members are 
more likely to signal their true feelings to the 
group when this can be done anonymously 
than when signals are public. This is indicated 
by the relatively lower correlation between 
signaled satisfaction and rated satisfaction in 
the overt feedback condition as compared 
with the covert feedback condition. It may 
be concluded then that S’s awareness of the 
true feelings of others in the group increased 
in the following order: no feedback, overt 
feedback, covert feedback. The data also 
demonstrate quite clearly that the Difficulty 


TABLES 


MEAN PERFORMANCE ScoRES FOR TASK AND SATIS- 
FACTION FEEDBACK CONDITIONS 








Condition Task1 Task2 Task 3 
ee ee ea Tt tT ee ee ee, 


No feedback 66 3.02 89 
Overt feedback 83 3.65 LAt 
Covert feedback 88 3.39 2.44. 


SS a 
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scale values derived from dimensional analysis 
are valid indicators of task difficulty. Thus, 
the finding that feedback conditions influ- 
enced performance most when the task was 
difficult supports the hypothesis that group 
effectiveness increases with increased aware- 
ness of member’s satisfaction and that this 
effect is greater with difficult than with easy 
tasks. 

These results are interpreted in terms of 
pluralistic ignorance and its effects upon the 
group problem-solving process. A state of 
pluralistic ignorance exists when group mem- 
bers mistakenly believe that others in the 
group hold a position different from their 
own. In group problem solving, such a state 
arises when a proposed solution is unaccept- 
able to some members, each of whom believes 
that all others in the group find it acceptable. 
This is most likely to occur when the task 
is difficult, since the correct solution to easy 
tasks is quickly determined by all members. 
When this happens, individuals may hesitate 
to disagree, to suggest alternative solutions, 
etc., rather than risk being perceived as a 
deviant. This inhibition of member participa- 
tion reduces the range of information avail- 
able to the group, prevents the exploration 
of alternatives, and generally reduces the 
involvement of individual members in the 
problem-solving process. That these condi- 
tions decrease the effectiveness of the group 
is well known (Darley, Gross, & Martin, 
1952; Maier & Solem, 1952). 

If the above interpretation is sound, the 
findings of this study have practical implica- 
tions. The effectiveness of small groups, such 
as committees, work teams, and the like, 
could be improved if the group members 
could be encouraged to express their satis- 
faction with the group process. Some device 
such as the moraleometer used in this study 
might be adapted for this purpose, although 
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such a device would probably be unaccepta- 
ble to operating groups. Training procedures, 
such as those developed by the National 
Training Laboratory, also might be effective, 
but this approach is costly both in time 
and money. It is of course possible that 
instruction concerning the facilitating effects 
of valid expressions of satisfaction and/or 
dissatisfaction would be effective. At present, 
however, there is no demonstrably satisfac- 
tory method of insuring that group members 
validly communicate their satisfaction (or 
lack of it) to their fellow group members. 
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MASCULINITY OF SMOKERS AND THE MASCULINITY 
OF CIGARETTE IMAGES 


PAUL C. VITZ1 anno DONALD JOHNSTON 


Pomona College 


It was hypothesized that the more masculine a smoker’s personality, the more 
masculine the image of the smoker’s regular brand of cigarette. The masculinity 
of 40 male and 40 female college-age smokers was measured with the Fe 
scale (CPI) and Mf scale (MMPI). Ss then stated their regular brand of 
cigarette and rated the masculinity of 13 top-selling cigarettes, including their 
own, Both groups had low but statistically significant positive correlations 
between their masculinity, on the Fe scale, and the rated masculinity of their 
cigarette. The results are interpreted as providing moderate support for the 
belief that product preference is a predictable interaction between the con- 
sumer’s personality and the product’s image. 


It is a common assumption that much 
consumer behavior is determined by the inter- 
action between the buyer’s personality and 
the product’s image. Faced with products 
which are identical, or very similar, the buyer 
presumably selects the one whose image or 
personality best satisfies his psychological 
needs. A strong proponent of this position, 
Pierre Martineau (1957), in discussing image 
advertising also argues that a product image 
is a symbol of the buyer’s personality and 
that product choices, being expressions of the 
self, are important indices of the buyer’s 
personality. 

Based in part on the above position, this 
study tests the hypothesis that the consumer 
has a psychological need for a product image 
which expresses the valued characteristics of 
his own personality. For example, the more 
sophisticated, intellectual, or economical a 
person, the more he prefers a product with 
an image having the same characteristics, 
assuming that he values the characteristic 
in question. The specific experimental hy- 
pothesis is: the more masculine the per- 
sonality of a smoker, the more masculine the 
image of the smoker’s regularly smoked 
cigarette. That is, masculine men smoke 
masculine-image cigarettes and feminine men 
smoke feminine-image cigarettes. Likewise, 
masculine women smoke  masculine-image 
cigarettes and feminine women smoke 
feminine-image cigarettes. In addition, if a 





1 Now at Stanford University, Stanford, Cali- 
fornia. 


person does not smoke but is asked what 
cigarette he or she might smoke, then it is 
predicted that a masculine nonsmoker will 
prefer a cigarette with a masculine image and 
a feminine nonsmoker will prefer a cigarette 
with a feminine image. 

The images of cigarettes were chosen for 
study because cigarette manufacturers engage 
in a great deal of image advertising, presum- 
ably because differences between many brands 
are small. In addition, cigarettes and their 
images are well-known to our subjects (Ss), 
college students. The personality character- 
istic of masculinity-femininity was selected 
because Martineau (1957) and others men- 
tion it as an important dimension of mean- 
ing for cigarette brands and because existing 
personality tests include well documented 
masculinity-femininity scales. Preliminary 
work also showed that Ss had little trouble 
in rating the images of cigarettes on this 
dimension, : 

Although the private files of advertising 
agencies may contain data supporting the 
hypothesis, none could be found in published 
form. The only related research which the 
authors were able to locate was a study by 
Koponen (1960). Koponen administered the 
Edwards Personal Preference Test to a large 
sample representing a cross section of the 
nation’s consumers. The scores on the psycho- 
logical needs measured by this test were then 
correlated with information on the S’s past 
purchases of a large number of different 
products. A few personality characteristics 
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were found to be related to purchases of 
different types of products, e.g., male smokers 
of nonfilter cigarettes had higher aggression 
scores than male smokers of filter cigarettes. 
However, of the total buying behavior vari- 
ance, only a small percentage was associated 
with the personality scores. A follow-up study 
was run to test more directly the hypothesis 
that the consumer’s personality affects his 
response to advertising. Two groups of Ss 
were selected: one with high scores, the other 
with low scores on one of the Edwards per- 
sonality needs. The two groups were matched 
on such variables as age, geographic location, 
etc. Mail-order material advertising the prod- 
uct in a manner to stress the relevance of the 
need was sent to both groups. The sales re- 
turns showed a small but statistically sig- 
nificant greater return from the group with 
the high need scores. Unfortunately, it is 
difficult to interpret the results in much detail 
since the psychological need, the product 
advertised and the advertising copy were not 
identified or described in the article. The 
results do give some support to the idea that 
personality interacts with advertising material 
to affect the consumers’ buying, but they 
shed no light on the specific hypothesis that 
a consumer prefers a product image with 
psychological characteristics that match his 
own. 


MertTHOD 


Subjects. All Ss were West Coast college students, 
between the ages of 18 and 22 (inclusive). The 
median age was 19. Approximately two-thirds at- 
tended state colleges and junior colleges, and the 
remainder attended small private colleges. 

Cigarettes. Thirteen common brands of cigarettes 
were used. According to a survey completed in the 
Fall of 1962, a few months before this study, they 
were the 13 top-selling brands in the Los Angeles 
metropolitan area and accounted for 88% of the 
area’s cigarette sales.2 On the basis of major product 
differences, they fall into three categories: (a) non- 
filter cigarettes (A,B,C,D), (6b) filter cigarettes 
(E, F, G, H, I, J, K), and (c) filter cigarettes con- 
taining additives, such as menthol and mint (L,M). 

Masculinity-Femininity Scales. Two measures of 
masculinity-femininity were used: the Fe (feminin- 
ity) scale of the California Psychological Inventory 
(CPI) and the Mf (feminine interests) scale of 
the Minnesota Multiphasic Personality Inventory 


2The authors are grateful to the Los Angeles 
Times Continuing Home Audit Service for furnish- 
ing this information. 
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(MMPI). The Fe scale of the CPI is designed “to 
assess the masculinity or femininity of interests 
[Gough, 1957, p. 13] 3.” The CPI manual (1957) 
describes masculine Ss on the Fe scale as “out-going, 
hard-headed, ambitious, masculine active, robust, 
and restless; as being manipulative and opportunistic 
in dealing with others; as blunt and direct in think- 
ing and action; and as being impatient with delay, 
indecision and reflection.” Feminine Ss on the Fe 
scale are described as “appreciative, patient, helpful, 
gentle, moderate, perservering, and sincere; as being 
respectful and accepting of others; and as behaving 
in a conscientious and sympathetic way.” The Fe 
scale, as part of the CPI, was developed to measure 
personality characteristics which “are related to the 
favorable and positive aspects of personality, rather 
than to the morbid and pathological.” The CPI 
was standardized primarily on college and high 
school subjects. 

In contrast to the CPI, the MMPI was developed 
primarily to assess pathological characteristics of 
adult personalities. Specifically, the Mf scale attempts 
“to identify the disorder of male sexual inversion 
[Dahlstrom & Welsh, 1960, p. 63].” In this study, 
on both the CPI and the MMPI scales, high scores 
will equal high masculinity. 

Procedure. Each S was given a three-part test 
booklet. Information on the outside cover gave 
directions for taking the test and explained that the 
experimenter (#) was interested in the attitudes and 
characteristics of cigarette smokers. The Ss who did 
not smoke were asked to take the test, as the com- 
parison of smokers and nonsmokers was also of 
interest. Part I of the booklet contained the CPI 
and MMPI items making up the Fe and Mf scales. 
A few buffer items from other scales of the MMPI 
were also included. Part II of the test asked S to 
write in the brand name of his regular cigarette. 
If S did not smoke, he was asked to answer the 
question: “If I did smoke, I would probably smoke 
——_________ brand ‘of cigarettes: Partel)isasmen 
S to rank order the 13 different cigarette brands from 
most. masculine to most feminine. The specific 
directions were: 


Please rank the thirteeen (13) brands of cigarettes 
listed at the bottom of this page in terms of how 
masculine or feminine you think each brand is. 
That is, how masculine or feminine do you find 
the image of each cigarette to be? (Consider its 
advertising, the people who smoke it, and any 
other associations that you may have.) The rank- 
ing is usually easier if you start at the two ends 
of the scale and then work in toward the middle. 
When you are through, please check to make 
certain that you used each brand name once and 
only once. 


The entire procedure lasted about 25 minutes. 
Most Ss were run in groups of 20 or more; however, 


8 Special permission to use the items comprising 
the Fe scale of the CPI was granted by Harrison 
G. Gough and the Consulting Psychologists Press, 
Palo Alto, California. 
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a few were given the test booklet, allowed to fill 
it in at their own convenience, and told to return 
it later. The Ss who smoked or said they would 
smoke brands of cigarettes other than the 13 used in 
the study were discarded. 


RESULTS AND Discussion 


Four categories of Ss were used in ana- 
lyzing the data: Male smokers (N = 40), 
female smokers (NV = 40), male nonsmokers 
(N = 57), and female nonsmokers (NV = 57). 
Table 1 presents each of these groups’ mean 
masculinity rating of the 13 brands of ciga- 
rettes (high ratings = high masculinity). The 
ratings were obtained by transforming the 
ranks into estimates of scores from a normal 
distribution (Hull, 1928, p. 491). The change 
maintains the ordinal relationships but gives 
more weight to the extreme ranks, e.g., 
Rank 1 = rating of 84, Rank 2= 73, ... , 
Rank 7 = 50, Rank 8= 46,... , Rank 12 
= 27, Rank 13 = 16. 

The ratings in Table 1 suggest that a filter 
and an additive, such as menthol, are major 
determinants of a cigarette’s masculinity- 
femininity rating. Both the male and female 
smokers rated the four nonfilter cigarettes 
(A, B, C, and D) as the four most masculine 
and the filter cigarette containing menthol 
and mint (M) as the most feminine. The 
other menthol cigarette (L) is also rated as 
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quite feminine. The ratings of the non- 
smokers are similar to those of the smokers, 
with the exception of Cigarette E. The male 
nonsmokers rate this cigarette as the most 
masculine; the female nonsmokers also rate 
E as more masculine than do the female 
smokers. This difference between the smokers’ 
and the nonsmokers’ ratings of Cigarette E, 
along with the additional findings presented 
below, is interpreted as evidence that in 
addition to product differences, such as 
filters, the advertised image of a cigarette 
also effects its masculinity rating. In a post- 
test interview with 10 smokers and 10 non- 
smokers, Cigarette E was unanimously se- 
lected as the cigarette with the most mascu- 
line emphasis in its advertising. An analysis 
of the content of cigarette ads appearing 
from December 1961 to June 1963 in two 
large-circulation national magazines showed 
that the ads of Cigarette E contained the 
most frequent masculine reference of any of 
the 13 brands, e.g., phrases such as “man’s 
world,” etc. Cigarette E was also the only 
cigarette of the 13 during that period which 
never had a female figure in its ads. The 
interpretation offered here is that the smokers 
rated E as less masculine than the nonfilter 
cigarettes because the filtered and presumably 
milder tobacco taste placed a limit on the 


TABLE 1 
Mean Mascutiniry RATINGS OF THE THIRTEEN BRANDS OF CIGARETTES 

















Mean rating 











Smokers* Nonsmokers> 
Letter SS ——— — 
Cigarette type code Male Female Male Female 
Nonfilter A 79.6 78.2 69.9 71.4 
B 67.2 70.9 59.0 60.6 
E 62.8 60.2 SUAS 56.6 
D 62.0 66.9 61.6 64.0 
Filter E 61.7 55.6 72.6 62.6 
F 53.0 49.2 51.0 47.6 
G 44.4 51.9 50.0 52V1 
H 44.0 47.8 48.7 50.0 
I 43.0 47.2 42.4 44.2 
J 36.7 33.9 42.2 42.4 
K 32.6 29.5 41.5 42.0 
Filter, menthol L 33.1 32.9 28.8 28.0 
Filter, menthol and mint M 29.7 26.0 29.2 28.6 





®N = 40. 
bN = 57. 
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TABLE 2 


BETWEEN GRouP CORRELATIONS OF THE MEAN Mas-— 
CULINITY RaTINGS SHOWN IN TABLE 1 














Smokers Nonsmokers 

Female Male Female 
Male smokers .98 91 93 
Female smokers = 87 93 
Male nonsmokers — — 97 





effect of the heavy masculine advertising; 
the nonsmokers, being unfamiliar with the 
taste, are assumed to have based their mascu- 
linity rating primarily on the cigarette’s 
advertised image, since this was about the 
only information available to them. Regard- 
less of the factors underlying the masculinity 
of a cigarette, the correlations presented in 
Table 2 show that between-group agreement 
on the masculinity of these 13 brands is 
quite high. 

The intercorrelations of the Ss’ ratings 
within each group generally were lower than 
the correlations between the mean ratings. 
The median intercorrelation was .74 for the 
male smokers and .76 for the female smokers 
(40 Ss and n(n-l1)/2 or 780 intercorrelations 
in each group); the median intercorrelation 
was .57 for the male nonsmokers and .55 for 
the female nonsmokers (57 Ss and 1,596 
intercorrelations in each group). 

Assuming that men are more masculine 
than women, either by definition or by the 
fact that the male Ss had higher masculinity 
scores on the Fe and Mf scales, it follows 
from the hypothesis that the cigarettes 
smoked by men should be more masculine 
than those smoked by women.‘ Data strongly 
supporting this prediction are presented in 
Tables 3 and 4. 

The four groups of Ss rated the masculinity 
of their own cigarette along with the other 
brands, and the means of these self-ratings 
are shown in Table 3. Table 3 shows that 


4 Male subjects had a mean Fe score of 15.59 and 
a mean Mf score of 33.7, Females had corresponding 
scores of 22.62 and 38.0. These mean scores are in 
close agreement with the published norms. There 
were no significant differences in masculinity between 
male smokers and nonsmokers or between female 
smokers and nonsmokers. 
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male smokers rate their cigarettes as being 
very much more masculine than do the 
female smokers. The mean difference of 21.1 
is highly significant, ¢ (78) = 6.55, p < .001. 
The group ratings also shown in Table 3 
represent the mean masculinity of S’s ciga- 
rette as rated by the others of the same sex 
and same smoking category. 

The group means were obtained as follows: 
each S’s cigarette was given a group score. 
This score was the group mean rating of the 
S’s cigarette, not including S’s own rating. 
This new mean was treated as a score and 
the mean of these scores is shown in Table 3. 
The group ratings, like the self-ratings, show 
that the cigarettes of the male smokers are 
rated as much more masculine than those 
of the female smokers. The difference be- 
tween the two means is not as great as be- 
tween the mean self-ratings, but still it is 
very significant (mean difference = 13.0, 
t (78) = 4.03, p< .001). The nonsmokers’ 
ratings of their hypothetical cigarettes are 
similar to the smokers’, with one exception: 
their self-ratings are even more sex-typed 
than those of the smokers. 

The self-ratings of the male and female Ss 
were also analyzed separately by brand. Nine 
brands were given as the regular cigarette 
by at least one male and one female smoker. 
(Brands A, B, D, and G were not smoked by 
any female S.) For every one of these 9 ciga- 
rettes, the median masculinity rating of the 
males was higher than the median masculinity 
rating of the females who smoked the same 


TABLE 3 


MAscutLinity RATINGS OF SUBJECT’S CIGARETTE AS 
RATED BY SELF AND By SUBJECT’s GROUP 








Self-Rating Group Rating® 





Group M SD M SD 
Smokers> 
Male 62.3 14.8 55.1 lod! 
Female 41.2 13.9 AZA ARG 
Nonsmokers*® 
Male 650m 15:6 54.56 Sez 
Female 35.1 20:0 40.0 13.2 





* Subjects of same smoking category and sex. 
bN = 40. 


oN =57, 
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brand. All 13 brands were given as the ciga- 
rette S would smoke by at least one male and 
one female nonsmoker. In all 13 comparisons, 
the median masculinity rating of the male 
nonsmokers was higher than the median 
masculinity rating of the female nonsmokers 
who would smoke the same brand. In short, 
males rate their cigarette as more masculine 
than females, even when both sexes are rating 
the same brand. 

Although the preceding findings are in 
agreement with the hypothesis, stronger sup- 
port of the hypothesis requires that within 
each sex the personality measures of S’s mas- 
culinity correlate positively with the mascu- 
linity ratings of S’s cigarette. The correla- 
tions in Table 4 show that the masculinity 
of both male and female smokers is positively 
correlated with the masculinity rating of their 
cigarette. Though the correlations are not 
high, they are all in the predicted direction 
and three of the four involving the CPI Fe 
scale reach statistical significance. A higher 
correlation with the Fe scale is understand- 
able since it was developed on a student 
population, Correlations analogous to those in 
Table 4 were also computed for the non- 
smokers. All of these correlations were posi- 
tive but their magnitudes were considerably 
smaller than those of the smokers and none 
reached statistical significance. (A check of 
the scatter diagrams of all above mentioned 
correlations revealed no apparent deviations 
from linearity.) 

The correlations in Table 4 are interpreted 
as moderate support for the hypothesis that 
within each sex a person’s masculinity is posi- 
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TABLE 4 


CORRELATION BETWEEN THE MAscuLiNity oF A 
SMOKER AND THE MASCULINITY OF SUBJECT’s 
REGULAR BRAND OF CIGARETTE 








Masculinity of 
subject’s regular 








cigarette 
Masculinity of rated by: 
Sex of subject meas- —_——— — 
smoker ured by: Self Group? 
Male Fe scale-CPI> ose coho 
N = 40 Mf scale-MMPI> 24 19 
Female Fe scale-CPI> .28 oon 
N = 40 Mf scale-MMPI> mi 5) 





® Other smokers of same sex. 
b High scores = high masculinity, 
*p <.05, two-tailed, df = 38. 


tively related to the masculinity of the image 
of the cigarette smoked. However, because 
the largest correlation of .35 accounts for 
only 12% of the variance, it is clear that 
many other factors must operate in S’s choice 
of a brand of cigarette. 
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RELATIONSHIPS BETWEEN THE IMPORTANCE AND THE 
SATISFACTION OF VARIOUS ENVIRONMENTAL FACTORS 


FRANK FRIEDLANDER 
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The relationship between the importance of 73 environmental factors and the 
satisfaction or dissatisfaction which these elicit was investigated for 1935 
government employees. Results indicate (a) a V shaped distribution between 
satisfaction-dissatisfaction and importance; (b) a positive correlation between 
satisfaction and importance, but a negative correlation between dissatisfaction 
and importance; and (c) factors of extreme satisfaction or dissatisfaction are 
more important than mild factors. Findings support a dual theory of self- 


actualizing and deficiency motivations. 


The relationship between the importance of 
various job characteristics and the satisfac- 
tion these same characteristics provide has 
been a subject approached indirectly in the 
literature from time to time. One group of 
studies has attempted to wtilize, simultane- 
ously, both of these or similar types of items 
in a combinatorial manner by merely sub- 
tracting a satisfaction score (the degree to 
which the need is met) from the importance 
score (the strength of the need), (Morse, 
1953; Ross & Zander, 1957). In a second 
group of studies, attempts have been made 
to examine the relationship between the satis- 
faction and importance dimensions. Schaffer 
(1953), for example, found that the rank- 
order correlation between satisfaction and im- 
portance within individuals ranged from .71 
to —.45. He concluded that the stronger the 
need, the more job satisfaction will depend 
upon its fulfillment. Froehlich and Wolins 
(1960) found that items with low satisfaction 
and high importance means were the best 
predictors of mean overall satisfaction. This 
latter finding has been reiterated in a dif- 
ferent form by Hulin (1963), who suggests 
that people tend to rank highest those things 
they both value and lack. 

The principal questions examined in this 
study were: (a) are the most dissatisfying 
factors in an employee’s environment neces- 
sarily those which are most important to him, 
and (6) does the nature of the environmental 
factor bear on the importance-satisfaction re- 
lationship? In addition, how do the above 
findings relate to current motivation theory? 


Mertruop 


The 146-item questionnaire utilized to obtain the 
data for this study was divided into two identical 
73-item sections. The directions for the first section 
were: 

In this section we would like some measure of 
your current satisfaction or dissatisfaction with a 
number of things. For each item, please check one 
of the following: (1) extremely satisfied, (2) moder- 
ately satisfied, (3) neutral, (4) moderately dis- 
satisfied, (5) extremely dissatisfied. 

The directions for the second 73-item section were: 

Now that you have indicated how satisfied or 
dissatisfied you are with various things, we would 
like some indication of how important each of these 
same things is to your feeling of satisfaction or 
dissatisfaction. For each item, please check one of 
the following: (1) of extreme importance to me, 
(2) very important to me, (3) of moderate im- 
portance to me, (4) of little importance to me, 
(5) of no importance to me. 

The 73 items in the questionnaire represented a 
wide variety of environmental factors that were 
postulated to be relevant to the employee. Among 
the main categories of items, 15 were concerned 
with work and were similar to those used in an 
earlier study (Friedlander, 1963); 8 were related 
to the quality, the reputation, and the facilities of 
the local school and college systems; 2 were con- 
cerned with the availability and the adequacy of 
local churches; 3 with the recreational facilities; 
8 with the marketing facilities; 6 with the housing 
facilities; and 8 with the quality and the adequacy 
of the medical facilities in the area. A complete 
list of all 73 items has been deposited with ADI.4 

The study was conducted in an isolated com- 
munity of about 12,000 people. The 4,200 primary 
wage earners (the population for this study) all 


1 Order Document No, 8158 from ADI Auxiliary 
Publications Project, Photoduplication Service, Li- 
brary of Congress, Washington, D. C. 20540. Remit 
in advance $1.25 for microfilm or $1.25 for photo- 
copies and make checks payable to: Chief, Photo- 
duplication Service, Library of Congress, 
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work directly for a branch of the government. Since 
the government owns and operates the majority of 
the housing, retail outlets, and various support and 
service facilities in the area, the sample upon which 
this study is based represents a variety of occupa- 
tions and socioeconomic levels. This type of total 
community (in which the spheres of work and of 
social-community relations are an almost indistin- 
guishable blend) offers a unique Opportunity to 
analyze concurrently the attitudes of those who 
live and work in the community toward the many 
experiences that are of importance to them. Such 
an approach contrasts with typical studies where the 
individual’s perception of only his industrial environ- 
ment is tapped. This latter type of study would 
seem to be based upon an artificial segmentation; 
while conclusions might be made concerning the 
worker’s motivation toward some segment of his 
work, little can be said concerning the relative im- 
portance of this segment to the worker’s total array 
of motivations. On the other hand, the very nature 
of a total community which permits a broad array 


IMPORTANCE 


3.5 3.0 


DISSATISFACTION 
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of motivations to be examined concurrently, may 
imply some unique characteristics of this population, 

Completed and usable questionnaires were returned 
from 1,935, or about 45% of the working popula- 
tion; control data available indicated minor distor- 
tions in returns—in the direction of greater participa- 
tion from scientists and engineers, and from those 
at the higher, white collar levels. 


RESULTS 


Figure 1 represents the scatterplot of the 
satisfaction and dissatisfaction means. The 
overall linear correlation of the 73 means is 
a nonsignificant .11, Thus, on an overall basis, 
it is apparent that satisfaction and impor- 
tance of environmental factors are unrelated 
phenomena. 

Further tests on the total distribution 
indicate a significant departure from linear- 


O NEGATIVE CORRELATIONS 


+ POSITIVE CORRELATIONS 
@ NONSIGNIFICANT CORRELATIONS 


eio 2.0 
SATISFACTION —-——— 


1.5 1.4 


Fic. 1. Scatterplot of satisfaction-dissatisfaction means and 
importance means for 73 environmental factors, 
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ity (p < .05). However, since the correlation 
ratio for predicting importance from a knowl- 
edge of satisfaction is significant at only 
about the .06 level (y= .51), the evidence 
for conclusions concerning a nonlinear rela- 
tionship between importance and satisfaction 
is considered inadequate. 

Observable characteristics of this scatter- 
plot suggested that two separate bivariate 
distributions might be present in the data, 
which could possibly be obscuring otherwise 
meaningful results. Each point in the scatter- 
plot represents the bivariate importance- 
satisfaction mean, around which each of the 
1,935 individuals were scattered. Importance- 
satisfaction correlations between individuals 
were computed for each of the 73 items, and 
ranged from .45 to —.39. Those items in 
Figure 1, in which the correlation exceeds 
the .001 level of significance (r = +.08), are 
signified by a + for items with a positive cor- 
relation and a O for those items with a nega- 
tive correlation. Conceptually, items with a 
positive correlation can be described as en- 
vironmental factors that are of importance 
to those individuals who are particularly satis- 
fied with this factor, while items with a 
negative correlation are those in the environ- 
ment that are of great importance to those 
who are particularly dissatisfied with the 
factor. A casual glance at the scatter of these 
two types of environmental factors suggests 
the possibility of two separate distributions. 
In addition, there is an apparent V shape 
of the scatter of all 73 items. 

These observations were tested in two ways. 
First, since a best-fit line through all 73 plots 
failed to provide an adequate representation 
of these plots (vy = .11), and since an observa- 
tion of the plots suggested two distributions, 


TABLE 1 


CORRELATIONS BETWEEN THE IMPORTANCE AND THE 
SATISFACTION-DISSATISFACTION OF ENVIRONMENTAL 











FACTORS 
Type of factors correlated Yr i) 
Most satisfying environmental factors RO itch DSO 
Most dissatisfying environmental factors —.36*  .46 37 
All environmental factors a -512 73 





_ 2p >.05 <.07; however, departure from linearity was 
significant beyond the .05 level. 
bp < 05: 


™ bp <.01. 
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TABLE 2 


COMPARISON OF THE IMPORTANCE OF FACTORS OF 
EXTREME  SATISFACTION-DISSATISFACTION WITH 
Factors oF MILp SATISFACTION- DISSATISFACTION 








Importance scores 
Factor M SD N 


Factors of mild 
satisfaction-dissatisfaction 
Factors of extreme 
satisfaction-dissatisfaction 
Difference 


2.343 594 37 


1.890 -322 36 
453* 





* pb <.01. 


the 73 items were dichotomized at the median 
satisfaction mean (represented in Figure 1 
by the center vertical line at the satisfaction 
score of 2.69). Separate correlations were 
then computed for items to the left of the 
median line (environmental factors important 
to dissatisfaction) and to the right of the 
median line (environmental factors important 
to satisfaction). As indicated in Table 1, 
significant correlations of —.36 and .51 sug- 
gest that this dichotomization is meaningful 
in that two separate best-fit lines now account 
for a significant amount of the variance of the 
plots that previously was unexplained. Al- 
though a casual observation of these two sepa- 
rate plottings suggest some curvilinearity (two 
hyperbolic distributions), tests of the correla- 
tion ratio (7) and of curvilinearity indicated 
these to be nonsignificant. Regression lines 
for predicting importance from a knowledge of 
dissatisfaction and satisfaction are drawn on 
Figure 1. It is apparent that the plots about 
each of these two regression lines are not dis- 
tributed in a homoscedastic manner; better 
predictions could probably be made of the 
importance of an extremely satisfying or ex- 
tremely dissatisfying factor than a factor of 
mild satisfaction or dissatisfaction. 

A more direct test of the first question, 
mentioned earlier, was made by a comparison 
of the mean importance of the most satisfying 
items with that for the most dissatisfying 
items. Dichotomizing again at the median- 
satisfaction score, such a comparison indicated 
no significant differences between the most 
satisfying 36 items and the most dissatisfying 
items. However, the V shaped distribution of 
all 73 items suggested that the extremely 
satisfying and the extremely dissatisfying en- 
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vironmental factors might be of far greater 
importance than items of moderate satisfac- 
tion. For this purpose, items were divided into 
quartiles on the satisfaction dimensions (see 
dotted vertical lines in Figure 1), and the ex- 
treme quartiles were pooled. A comparison of 
the items of extreme satisfaction and dissatis- 
faction with the items of moderate satisfaction 
indicated that the former were significantly 
more important, as indicated in Table 2. Thus, 
while satisfying and dissatisfying factors were 
of similar importance, factors of extreme 
satisfaction and dissatisfaction were of greater 
importance than items of moderate satisfac- 
tion and dissatisfaction. 

The nature of the environmental factors 
seemed to provide further meaning to the 
varying satisfaction-importance relationship. 
Of the 12 factors of extreme importance and 
satisfaction (upper right portion of Figure 1), 
10 were concerned with the content or context 
of the individual’s work. The mean correlation 
across the 1,935 individuals of the 12 items 
was .24, indicating that for those for whom 
work was important, it tended to be highly 
satisfying. Of the 12 factors of extreme im- 
portance and dissatisfaction (upper left por- 
tion of Figure 1), 6 were concerned with the 
quality, availability, or fees of medical facili- 
ties and physicians in the area; the remaining 
were concerned with the prices and avail- 
ability of goods and services in the local 
shopping area. The mean correlation of these 
items was —.19, indicating these factors were 
of particular importance to those individuals 
who were highly dissatisfied with them. 


SUMMARY AND CONCLUSIONS 


The findings of this study may be sum- 
marized as follows: 

1, The satisfaction and the importance at- 
tributed to various environmental factors are 
unrelated when mean satisfaction-importance 
scores are correlated across all factors. 

2. Satisfaction and importance are signifi- 
cantly related if environmental factors are 
dichotomized into satisfying and dissatisfying 
experiences. The relationship (across both 
people and items) is positive in the case of 
satisfying factors and negative for dissatisfy- 
ing factors. 

The positive items are concerned almost 
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entirely with work, and indicate that for 
those for whom work is important, it is 
highly satisfying. The negative items are com- 
posed of attitudes toward medical and shop- 
ping facilities in the area, and indicate that 
for those for whom these environmental fac- 
tors are important, they are quite dissatisfying. 

3. Satisfying and dissatisfying environ- 
mental factors are of approximately equal 
importance. However, factors of extreme satis- 
faction and dissatisfaction are significantly 
more important than factors of mild satisfac- 
tion or dissatisfaction. 

These findings obviously shed doubt on the 
assumption that the more dissatisfying factors 
in one’s environment are more important. For 
the most satisfying factors, which in this 
study include the work situation, the tendency 
is for the more important items to be more 
satisfying. In general, findings such as those 
by Schaffer (1953) that the stronger the 
need, the more job satisfaction will depend 
upon it, might be expanded to include the 
hypothesis that, the stronger the need, the 
more satisfaction or dissatisfaction will de- 
pend upon it. 

It is also apparent that results of studies 
investigating the importance or satisfaction 
of factors in the work situation only probably 
represent a very restricted range of the work- 
ers’ total array of motivations. The work items 
in this population are clustered almost entirely 
in the upper right portion of Figure 1, and 
do not represent a particularly broad scope 
of the individuals’ motivational structure. 

The results of this investigation would 
seem to add empirical weight to earlier studies 
suggesting a dual motivation theory. One set 
of environmental factors in this study, (a) 
contributed primarily toward dissatisfaction, 
(6) was particularly important to those who 
were dissatisfied with it, and (c) became more 
important as it became more dissatisfying. 
This type of environmental factor would seem 
to fullfill Maslow’s (1955) description of 
deficit needs in that the drive or need for 
these factors, “presses toward its own elimina- 
tion . . . striving toward cessation .. . to- 
ward a state of not wanting.” The one cluster 
of factors in this study which contributed the 
most toward dissatisfaction and tended to be- 
come more important as it became more dis- 


164: 


satisfying was concerned with the adequacy of 
medical facilities and physicians in the local 
area. This finding might have been predicted 
on the basis of Maslow’s description of de- 
ficiency. characteristics: ‘‘(a@) its absence 
breeds illness, (0) its presence prevents ill- 
ness, (c) its restoration cures illness, (d).. . 
it is preferred by the deprived person over 
other satisfactions, (€) it is... indicative, at 
a low ebb... in the healthy person.” 

A second set of environmental factors in 
this study contributed primarily toward satis- 
faction and tended to become more important 
as it became more satisfying. This data would 
seem to encompass Maslow’s description of 
factors that may contribute toward self- 
actualization in that these “impulses are... . 
enjoyable and pleasant, that the person wants 
more of them rather than less, and that if 
they constitute tensions, they are pleasurable 
tensions. . . . gratification breeds increased 
rather than decreased motivation.” 

In general agreement with the theory set 
forth by Herzberg, Mausner, and Snyderman 
(1959), this second set of items was composed 
almost entirely of factors in the work situa- 
tion. The most potent satisfiers in order of 
decreasing importance were a feeling of 
achievement in the work I am doing; work 
requiring the use of my best abilities; and 
performing challenging assignments on my job. 

These results are given further operational 
significance in a study by Friedlander and 
Walton (1964), which indicates that charac- 
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teristics of the work content and process serve 
primarily to elicit positive motivations in at- 
tracting the employee to remain with his or- 
ganization, while characteristics of the work 
context and of the community serve primarily 
to evoke negative motivations in causing the 
employee to be sufficiently dissatisfied to 
leave. 
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EFFECTS OF PRODDING TO INCREASE 
MAIL-BACK RETURNS ' 


BRUCE K. ECKLAND 
University of North Carolina 


The less accessible Ss in a survey sample often exhibit unusual characteristics 
which may be of particular interest to the researcher. When telephone and 
certified-mail techniques were used to obtain a 94% mail-back return from 
1259 former college students, academic achievement variables were found to be 
related to the amount of prodding which was required to obtain a response. 
No significant relationship was found, however, between the amount of 
prodding and the veracity of the respondents’ replies. 


Persons in the community sometimes tend 
to be inaccessible to the researcher who is 
interested in the study of individual deviance. 
If his investigation requires obtaining mail 
responses that express the views of persons 
who deviate from the “norm,” how does he 
get them to return a truthful questionnaire? 
Errors due to low return rates and poor verac- 
ity may be acute when it is the hard to get 
respondents who most clearly possess the 
special attributes which the researcher wishes 
to investigate. 

This was the nature of a problem that 
confronted the writer in a recent study on 
the college dropout. Selecting a group of 
freshmen who entered college 10 years ago, 
the central task was to locate the permanent 
dropouts, i.e., the students who left college 
and never returned to graduate at the institu- 
tion of first registration or any other institu- 
tion. Since the writer was interested in the 
problems of status frustration produced by 
the dropout situation, it was especially desir- 
able to locate former students who had seri- 
ously failed to meet academic standards and 
for whom a relatively unrestricted admis- 
sions policy in American colleges may have 
unanticipated consequences. 

Pretests of a mail questionnaire sent to a 
subsample (” = 288) indicated that it was 
the very late respondent who tended to be 
the true dropout and who could be charac- 
terized as an academic failure. Thus, every 
effort was made to obtain a high mail-back 
return after the final sample had been drawn. 


1 This report is based upon research sponsored and 
financed by the Office of Instructional Research at 
the University of Illinois. 


This report is directed toward three pur- 
poses: (a) to indicate how telephone contacts 
and certified letters helped to achieve a 94% 
response from the subjects (Ss) who were 
solicited; (6) to determine whether or not 
a high mail-back return was necessary for 
locating Ss who exhibited the special charac- 
teristics upon which the study depended; 
and (c) to measure the extent to which our 
persistent efforts, or prodding, may have 
contributed to less truthful replies. 


MrtTHOD 


In late spring 1962, questionnaires were mailed to 
1,332 men who had enrolled as freshmen at a large 
state university in September 1952. In a majority 
of cases, their “last known addresses” were 10 years 
old which resulted in many questionnaires, 192, 
being returned by the Post Office for lack of for- 
warding addresses. A state-wide telephone search, 
and the location of transfer students by correspond- 
ence with the registrars of 104 colleges and uni- 
versities to whom academic transcripts of the non- 
university graduates had been sent, reduced the 
number of permanently lost cases to 73, or only 5% 
of the sample. Thus, 1,259 former students pre- 
sumably received our mail. 

Three “waves” of mail resulted in a 67% mail- 
back response from the 1,259 students whose ad- 
dresses had not been designated as lost by the 
Post Office and unretrievable by us. The initial 
mailing contained a printed questionnaire, cover 
letter, and stamped-return envelope. The second 
Wave was sent 20 days later and had the same 
enclosures as the first, except for a new cover letter. 
The third wave, a reminder card, was mailed 5 days 
after the second. 

Forty days following the first wave, the number of 
returns had declined to less than 10 a day. Mean- 
while, arrangements were completed to begin placing 
telephone calls to the Ss who had not yet responded.2 


2 Early in June 1962, the University installed a 
number of Wide-Area-Telephone Service (WATS) 
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Over 1,000 calls were placed during the next 3 weeks 
and produced personal contacts with the S, or some- 
one knowing him, in 329 cases. When the S was 
reached, he was requested to complete and return 
the questionnaire.® 

In a number of cases, 54, no telephone contact 
was made with the nonrespondent or anyone know- 
ing him, This usually was either because the party 
did not answer (5 or more calls on 5 separate days) 
or because no phone listing could be found in the 
region of the state which was indicated by the last 
known address. These Ss all received certified letters, 
with a signed postal-receipt requested, nclosures 
included a personal note and a new questionnaire, 
In addition, certified letters were sent to 118 Ss who 
were contacted by telephone because they still had 
not returned a questionnaire after a 2-to-3 week 
waiting period. The certified letter, then, was our 
final effort to elicit a response. Altogether, 383 
nonrespondents either were contacted by telephone, 
directly or indirectly, or received a certified letter, 
or both, 

Yor the purposes of analysis, the respondents are 
grouped by four successive stages based upon the 
amount of prodding required to obtain a mail-back 
return: 

1, First wave includes the 510 respondents whose 
returns were received within the first 23 days follow- 
ing the initial mailing (or just prior to the upswing 
in daily returns resulting from the second wave). 

2. Second and third waves include the 334 re- 
spondents who returned questionnaires on the twenty- 


lines rented from the telephone company which per- 
mitted an unlimited number of calls to be placed to 
long-distance points in the state without specific 
toll charges, thus providing a unique opportunity 
for employing the WATS lines in this research after 
working hours when they were not in normal use, 

8 There were only seven phone refusals, ie., either 
a subject who stated that he wouldn’t fill out and 
return the questionnaire or a parent who refused to 
give out any information over the phone. The re- 
fusals represent less than 2% of the total number 
of phone contacts. 
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fourth day, or thereafter, but were not contacted 
by telephone or certified mail. 

3. Telephone or certified includes the 268 respond- 
ents who had been contacted, directly or indireetly, 
by phone or who had received a certified letter (but 
not both). 

4. Telephone and certified includes the 68 respond- 
ents who had been contacted both by phone and 
certified mail, 


RESULTS 


Outcome of Telephone Contacts and Certified 
Letters 


Completed questionnaires eventually were 
returned by 83% of the 329 men with whom 
we spoke directly or whose relatives we con- 
tacted by phone. Returns were received from 
64% of these men during the 2-to-3 week wait- 
ing period following each phone contact and 
from an additional 19% after the certified 
mailing. Of the 54 men who were sent certi- 
fied letters directly because no telephone con- 
tact was made, 74% returned questionnaires, 

To summarize briefly, mail-back returns 
were received from 82% of the 383 non- 
respondents, raising the total response rate 
from 67 to 94%, It must be reemphasized 
that, on the whole, these were resistant Ss. 
Prior to the telephone contacts and certified 
letters, they had not responded to three sepa- 
rate mailings. Furthermore, 6 to 12 weeks 
had elapsed between the first mail wave and 
the telephone contact or certified mailing. 


Selected Characteristics of Late Respondents 


If an early cutoff date for mail-back returns 
had been established following three mail 
waves, college dropouts would have been 


TABLE 1 


PERCENTAGE 


or COLLEGE GRADUATES AND Dropouts AMONG RESPONDENTS 


To Hacn Successtve STAGE OF PRODDING 


Second Telephone Telephone 
First and third or and All 
wave waves certified certified returns 
Graduate status (N = 510) (N = 334) (N = 268) (NV = 68) (N = 1,180) 
University graduate 63.1 50.0 35.8 23,5 50,98 
Transfer graduate 17.8 19.8 18.3 11.8 18,1" 
Dropout 19.0 30.2 45,9 64.7 30.9 
Total 99.9 100.0 100,0 100.0 99,9 











«A discussion of the unugually high rate of graduation among students who matriculated at a state university is presented 


elsewhere (Mekland, 1964), 
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TABLE 2 


PERCENTAGE EXHIBITING SELECTED CHARACTERISTICS AMONG COLLEGE-DROPOUT 
RESPONDENTS TO EAcH STAGE OF PRODDING 





Second Telephone 
First and third and/or All 
wave waves certified dropouts 
Selected characteristics (V = 97) (V = 101) (NV = 167) (NV = 365) 
Ranked in lowest three deciles 
of high school class* 322 18.9 ed 21.4 
Scored in lowest two deciles 
on ACE aptitude test® 12.1 17.4 20.8 ie 
Did not obtain “C’’ grade 
average first college term 63.9 65.3 78.4 71.0 
Dismissed from university for 
poor scholarship 46.4 55.4 56.3 53.4 





« Percentages are based upon somewhat lower Ns due to incomplete records on high school rank and test scores for the American 
Council on Education psychological examination. The subgroup Ns are 91, 95, and 159, respectively, for high school rank, and 66, 


69, and 96, respectively, for the ACE scores. 


underrepresented among the _ respondents. 
Returns would have been obtained from only 
198 dropouts, or 23% of the cumulative 
number of respondents at that stage. The 
ie of college dropouts progressively 
increased with each successive stage of prod- 
ding, with telephone and certified mail tech- 
niques, together, increasing their final number 
to 365, or 31% of all respondents. 

When the college dropouts are separated 
from other respondents, significant differences 
are observed between the stages of prodding 
for response and four variables of academic 
achievement (see Table 2). On all indices, 
the late respondents were the least successful 
students. 


Veracity of Early Versus Late Returns 


To what extent, if any, does prodding lower 
the veracity of returns from late respondents? 
Several sources of information were available 
to investigate the veracity of the question- 
naire returns. Four types of checks were per- 
formed, as presented in Table 3 and discussed 
below. In general, a slightly higher proportion 
of respondents in the second stage of prodding 
were found to give discrepant replies com- 
pared to respondents in the first stage. 
However, replies from respondents in the 
third and fourth stages, on the whole, ap- 
peared no more discrepant than those from 
respondents in the first stage (and, to some 
extent, less discrepant than in the second 
stage). 


Father’s occupation compares questionnaire 
replies with data recorded and coded by the 
university in 1952 from the students’ appli- 
cation for admission. To code father’s occupa- 
tion, the university had used an occupational 
classification consisting of 15 categories. Our 
questionnaire item which approximates the 
1952 data is a statement that asked the 
respondent to describe “your father’s (or head 
of household) usual occupation when you 
were growing up.” United States Census 
occupational categories were used for coding 
these replies. 

Discrepancies were counted by first sepa- 
rating the cases into dichotomous occupational 
groups based upon the 1952 university data: 
professional managerial and nonprofessional 
managerial. We then determined how many 


TABLE 3 


PERCENTAGE OF RESPONDENTS IN Eacu STAGE OF 
PRoppING WHOSE REPLIES WERE 
Founp DiscREPANT 





Second Telephone 





First and third and/or 
Item involving wave waves certified 
discrepancy (N =510) (N =334) (N =336) 
Father’s occupation 15.0 20.4 22.6 
City/farm residence 8.4 8.4 6.5 
Academic failure 25.0 35.4 24.6 
Earned degree> 0.0 4.0 0.6 





8 Percentages are based upon respondents who left the uni- 
versity on scholastic probation or dismissal, 56, 48, and 57, 
respectively. 

b Percentages are based upon the number of dropouts in 
each stage, 97, 101, and 167, respectively. 


168 


respondents described their father’s occupa- 
tion in a manner which, when classified by 
our United States Census categories, did not 
correspond to the 1952 occupational groups. 
Discrepancies were found in about 1 out of 
5 cases. 

This index cannot be considered a satis- 
factory measure of the veracity of the re- 
spondents as a whole, since it may only 
reflect the extent to which the 1952 and 
1962 classifications of occupation are not 
comparable. However, assuming that errors 
due to differences in the questions or cate- 
gories are randomly distributed, the index 
may still measure the amount of veracity of 
one group of respondents relative to that of 
another group. Comparisons in Table 3, be- 
tween respondents in each stage of prodding, 
show a somewhat higher proportion of dis- 
crepancies on father’s occupation among the 
late returns. 

City/farm residence compares, again, a 
questionnaire item with information obtained 
by the university in 1952 at the time of ad- 
mission. The 1952 coding only distinguished 
between city and farm residence of the stu- 
dent before coming to college. The question- 
naire item included a list of 10 categories on 
population size and asked the respondent to 
select the one which “best describes the com- 
munity you think of as your home town 
during high school days.” “Farm or open 
country” was among the possible choices. A 
discrepancy was counted as any case in which 
farm residence was recorded in the 1952 data 
but not in the questionnaire reply, or recorded 
in the questionnaire reply but not in 1952. 
As indicated in Table 3, the percentage of 
discrepant cases is consistently low in each 
stage of prodding for returns. Again, some 
amount of discrepancy would be expected 
due to imperfect correspondence of the cate- 
gories in the university records and in the 
questionnaire, and imperfect reliability in 
each of these sets of categories. 

Academic failure measures the percentage 
of respondents who were known to have left 
the university on scholastic probation or dis- 
missal, but who did not report ‘academic 
difficulties” in the questionnaire as a signifi- 
cant reason for leaving. Data on the former 
student’s last academic status was recorded 
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by us directly from university ledgers. The 
questionnaire item provided the respondent 
with a checklist and asked him: “In order 
of relative importance, select the first, second, 
and third reasons as they affected your 
decision to discontinue college attendance.” 
About one out of four respondents who left 
college with poor academic records did not 
volunteer academic failure as one of the three 
most important reasons for leaving. Although 
the proportion of discrepant cases increased 
between the first and second stages of 
prodding, the proportions are the same when 
comparing the first and final stages. 

Earned degrees at 71 different colleges and 
universities were claimed by 195 respondents. 
The registrars of the transfer institutions later 
furnished the information needed to verify 
these reports. Five respondents, or 1.4% of 
all nongraduates (permanent dropouts), were 
found to have misrepresented an earned 
degree. These were not flagrant misrepresen- 
tations, but cases in which the respondents 
were somewhat “stretching a point.” (Four 
of the students actually had attended college 
for a period normally long enough to have 
obtained a degree, including one, according 
to the registrar’s statement, who simply “lacks 
a course.”) Again, when discrepancies were 
related to the stages of prodding, an increase 
(from none to 4 who misrepresented a 
degree) is found between the first and second 
stages. Yet, only 1 dropout of 167 in the 
final stages of prodding incorrectly reported 
an earned degree. 


DISCUSSION 


The success of our prodding was similar 
to that reported by other researchers who 
have employed telephone follow-ups to gain 
the cooperation of nonrespondents (Berdie, 
1954; Donald, 1960; Levine & Gordon, 
1958; Suchman & McCandless, 1940). A 
long-distance call from the university un- 
doubtedly impressed upon these Ss_ the 
importance and urgency of their response. 

Partly because the certified letter was not 
established by the Post Office until 1955, the 
writer is unaware of any studies reporting 
its use; although registered mail has been 
found successful previously (Shuttleworth, 
1941; Slocum, Empey, & Swanson, 1956). 


INCREASING Matt RETURNS 


Certified mail, coupled with a postal receipt, 
is a relatively inexpensive means of verifying 
delivery by requiring the recipient to affix 
his signature upon a card which is then 
returned to the sender. It serves the same 
purpose as a registered letter, except there 
is no insurance coverage on the contents of 
the letter (and, thus, about one-third the 
cost). 

In addition to the success of these prodding 
devices, we demonstrated the underrepre- 
sentation of individuals exhibiting selected 
characteristics which would have occurred if 
efforts to solicit returns had terminated after 
a third mail wave. Very similar differences 
were found in a previous study that compared 
the former students who responded and did 
not respond to a mail questionnaire on sev- 
eral academic achievement variables (Reuss, 
1943). 

Finally, the critical methodologist might 
rightly question the truthfulness of replies 
received from respondents who were severely 
prodded to return a questionnaire. Our data 
suggest, however, that telephone and certified- 
mail techniques did not lower the veracity 
of response when used to obtain an unusually 
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complete mail-back return. Thus, the applica- 
tion of prodding devices in the problems of 
mail surveys seems warranted on_ several 
grounds. 
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A COMPARISON OF ERROR IN FIVE SORTING PROCEDURES 
FOR ORDINAL RANKING' 


JOSEPH M. MADDEN, JOE T. HAZEL, ann ROGER D, BOURDON 


6570th Personnel Research Laboratory, Aerospace Medical Division, 
Lackland Air Force Base, Texas 


The effect or sorting procedures on ranking error was investigated. Different 
groups of Ss ranked a series of 50 stimulus cards using 5 different sorting 
methods. Significant differences in ranking errors among the 5 methods were 
observed, with a “free” procedure showing less error than “structured” 


procedures. 


Although the ranking method is frequently 
used in psychological research, the procedure 
by which a subject (S) arrives at his final 
rank ordering is largely a matter of the 
experimenter’s (Z’s) preference. The effects 
of varied instructions when the method of 
rank order is used are unknown. For instance, 
when cards are ranked on some dimension, 
would instructions to proceed according to 
some definite system be more or less ef- 
ficient than allowing the S the freedom to 
choose his own procedure? This paper pre- 
sents a comparison of the efficiency of five 
techniques for rank ordering stimuli on cards, 
where efficiency is defined in terms of the 
absolute difference between an S’s judged 
rank order and the rank order based on 
physical measurement. 


METHOD 


A set of 54 stimulus cards measuring 2% inches 
4 inches were drawn, each card containing two 
circles. One corner of each card was clipped to indi- 
cate the top of the card. The two circles on each 
card differed in area, the size of this difference in 
area being the basis for ranking the cards. The area 
of the circles ranged from a minimum of 314 square 
millimeters (20 millimeters in diameter) to a maxi- 
mum of 1,809 square millimeters (48 millimeters in 
diameter). The circles were drawn so that the dif- 
ference in area between them increased progressively 
from card to card. The difference in area between 
the two circles was the smallest on Card 1 (66 
square millimeters), and the largest on Card 54 
(1,495 square millimeters). The position (right or 
left) of the larger circle on each card was randomly 
determined. _ 

Briefly, the essential steps for sorting the 54 cards 
in each of five selected procedures were as follows: 


1The research reported in this paper was spon- 
sored by the 6570th Personnel Research Laboratory, 
Aerospace Medical Division (AFSC) Lackland Air 
Force Base, Texas under AFSC Project 7734(02). 
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A. Two pile sort: Sorting into 2 approximately 
equal piles, one of cards with small differences, the 
other of cards with large differences. Next, sorting 
each of the 2 piles in the same way and then again 
sorting each of the 4 piles. Finally, ordering the 
cards in each pile, ordering the 8 piles, and then 
going through the entire deck to “be sure that the 
cards are all in the order you consider correct.” 
In this sort, the Ss formed a total of 14 piles 
(2+4+ 8). 

B. Large pile sort: Sorting into 2 piles, then sort- 
ing the larger of the 2 piles into 2 piles, then sorting 
the largest of the 3 piles into 2 piles, and so on 
until 6 piles remained. Then ordering within each 
pile and a final check (total piles formed = 10). 

C. Three pile sort: Sorting the cards into 3 ap- 
proximately equal piles; one of cards with small 
differences, one of cards with large differences, and 
one of cards containing a medium or uncertain 
difference. The same procedure was then followed 
for each of the 3 piles, the cards then ordered within 
each pile, the 9 piles ordered, and a final check 
made through all the cards as in Sort A (total piles 
formed = 12). 

D. Large-small sort: Sorting out those cards with 
very extreme small differences and very extreme large 
differences, the remaining pile including cards which 
did not contain extremely large or small differences. 
This procedure was repeated twice for the middle 
pile until 7 piles were produced, at which time 
the middle pile was split into 2 to make a total 
of 8 piles. Finally, ordering the cards in each pile 
and a final check through the entire deck (total 
piles formed = 11). 

E. Free sort: Ranking the cards using any pro- 





TABLE 1 
MEAN ERROR FOR THE FIVE SORTING 

PROCEDURES 

Sort Mean error 

A 6.96 

B 5.80 

Ee 6.24 

D 6.65 

E Dio 


ORDINAL RANKING 


TABLE 2 


ANALYSIS OF VARIANCE OF SORT PROCEDURES 
AND STIMULUS CARDS 

















Source df MS F 
Sort procedures (S) 4 965.56 25.47* 
Cards (C) 53 263.22 6.70* 
rsSxXC 212 34.29 0.87 
Error (Mean Square) 13,770 39.30 
Total 14,039 
*p <.001. 


cedure the S desired. For this sort, the total piles 
formed could vary from zero upwards. 

For each sort, 52 different basic airmen served 
as the Ss. Instructions were given in detail by an 
experienced test administrator and Ss were permitted 
to ask questions if they desired. The first step in 
the instructions requested the Ss to go through the 
entire deck of cards, which were arranged in random 
order, observing the size of the difference on each 
one in order to provide an adequate degree of 
familiarity for the rank ordering. 


RESULTS 


The observations used for analysis were the 
absolute difference values between the rank 
order assigned to each card by an S and the 
rank order of a card based on its physical 
difference in area. The mean of these dif- 
ferences (error), averaged across cards and 
Ss for each of the five procedures, are shown 
in Table 1. The average error for each card 
across the Ss and procedures are given in 
Table A* (Range 4.03 to 8.20, Md = 6.45). 

The results of a two-way analysis of vari- 
ance for the sorting procedures and cards are 
presented in Table 2. The Fs for sorts and 
cards were both significant (p < .001), indi- 
cating the differences in means for the two 
dimensions were not attributable to chance 
variation. 

Since different groups of Ss were used for 
the five sorting procedures, differences among 
the group’s general learning ability and per- 
ceptual aptitude were of interest. An analysis 
of Armed Forces Qualification Test Categories 

2A 2-page table giving position and area of 
circles, area differences, and average error for each 
of the 54 cards has been deposited with the Amer- 
ican Documentation Institute. Order Document No. 
8157 from ADI Auxiliary Publications Project, Pho- 
toduplication Service, Library of Congress, Washing- 
ton, D. C. 20540. Remit in advance $1.25 for micro- 
film or $1.25 for photocopies and make checks pay- 
able to: Chief, Photoduplication Service, Library of 
Congress. 


ry 


for the five sorts resulted in a chi-square 
Otel OGM (df= 12.) =" .05) indicating? that, 
the groups did not differ in a significant man- 
ner as assessed by this instrument. 


DiIscussION 


Not only does the “free” sorting procedure 
appear to be the more efficient of the five 
procedures tested, but there seems to be a 
continuum of structure represented in Table 1. 
When the five procedures were ordered on 
the abscissa according to the total number 
of piles formed for each sort, Sort E, the un- 
structured procedure, had the smallest mean 
error. Sort B, which seemed to require the 
least restriction on the freedom of the S, or 
next smallest number of piles after Sort E, 
also had the next lowest error after E. Sort D 
appears somewhat more restrictive than B, 
and Sorts C and A might be construed as 
progressively more restrictive than D. The 
mean error for Sort C did not follow the error 
trend of the other four procedures. To the 
extent that there is a continuum of structure 
or restriction of freedom so that the greater 
the structure the greater the error rate, such 
a phenomenon may apply to experimentation 
and testing of human Ss in general. 

The average error found for each of the 
54 cards listed in Table A indicated a tend- 
ency for error to decrease as the ratio of the 
area difference to circle size increases. A 
proportion was computed for each card by 
dividing the difference in area by the total 
area of the two circles. The correlation of 
this proportion with the average error was 
—.64. Considering that the proportions con- 
stitute. only a very rough continuum, this 
seems a meaningful relationship. 

From observation of the “free sort” during 
testing, it appeared to the Es that each S$ 
tended to devise his own unique sorting 
procedures. It was not only difficult to ob- 
serve any communalities among such pro- 
cedures, but Ss themselves have difficulty in 
specifying exactly how they accomplished 
their rankings. Considering this uniquely 
personal nature of sorting procedures, it is 
not surprising that the free sort procedure 
was the most efficient in this study. The use 
of a procedure intended to simplify the task 
seems to have the opposite effect. 


(Received March 16, 1964) 
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THE ASSIGNMENT OF JOB-ATTITUDE ITEMS 
TO SUBSCALES * 
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Purdue University 


An attitude questionnaire, for use with the personnel of a Great Lakes shipping 
concern, was constructed using rational judgments to assign items to areas of 
the questionnaire. The areas were nominally those suggested as invariant by 
Wherry. The degree of agreement between the rational sort and the areas 
named from the invariant factor characteristics was determined. Attitude 
characteristics of the surveyed group as well as mechanical and response arti- 
facts are suggested as providing partial explanations of the differences found. 
Items identified as critical through use of several item indices are compared to 


evaluate index utility. 


An attempt was made in this study to as- 
sign items to areas or subscales of a question- 
naire on the basis of rational judgments using 
the Wherry (1958) invariates as reference 
points and to investigate several indices to 
morale utilizing a conventional satisfaction 
scale, an importance scale, item variances, 
and combinations of scales and variances. The 
assumption was made that the identification 
of critical and sensitive items would be ac- 
complished more effectively through the use 
of certain other indices in addition to high 
(or low) item satisfaction values. 

Wherry analyzed the results of a number of 
morale surveys to obtain invariance values for 
factors identified in several investigations. By 
showing the comparability of the invariance 
values for mental abilities measures and for 
morale scores, he provided support for the 
view that most morale measures contain a 
general factor and five group factors—super- 
vision, financial rewards, working conditions, 
confidence in management, and self-develop- 
ment. Cureton (1960), Cureton and Sargent 
(1960), Dabas (1958), King (1960), and 
Twery, Schmid, and Wrigley (1958) have 
supplied additional interpretations related to 
general factors in attitude measurement. 

In addition, in recent years considerable 
effort has been expended in investigating 


1 This article is a partial report of a morale assess- 
ment project conducted by the Occupational Research 
Center, Purdue University, under the direction of 
W. A. Owens, Jr. The authors gratefully acknowledge 
the courtesy of the company concerned for permis- 
sion to use this material. 

2 Now at the University of Tennessee. 





less conventional item scoring methods for 
items used in attitude questionnaires. Glen- 
non, Owens, Smith, and Albright (1960) sug- 
gested use of an importance measure. Gillmer 
(1961) used two scales for each item, a satis- 
faction scale and an importance scale, and de- 
rived a composite score by multiplying the 
item-satisfaction value by the item-importance 
scale value. Youngberg, Hedberg, and Baxter 
(1962) have reported results obtained through 
the addition of item-importance values to 
item-satisfaction values. Of six “most criti- 
cal” items identified by satisfaction alone and 
six identified by satisfaction values combined 
with importance values, only one item was 
common to both lists. 

Herein, the decision was made that infor- 
mal interviews should be conducted in order 
to determine proper, useful item content; to 
learn idiomatic, specialized sailing terminol- 
ogy; and generally to orient the investigators 
to the feel of the environment for the task at 
hand. Accordingly, extensive interviewing was 
conducted by five investigators on half the 
ships of a Great Lakes fleet of ore boats. An 
introductory letter, indicating that no attempt 
would be made by the company to identify 
any member of any crew during the interview- 
ing or after the administration of the ques- 
tionnaire, had been posted on the bulletin 
boards of the ships for at least 1 week before 
interviewing started. 

Immediately following each interview se- 
ries aboard a ship, each investigator prepared 
a rough draft of content material from his in- 
terview notes. After all interviewing had been 
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completed, the several investigators independ- 
ently prepared items designed to cover sources 
of dissatisfaction and to apply to specific satis- 
faction areas. In an attempt to make use of 
the invariants suggested by Wherry (1958) 
these areas were designated as General Morale 
(G), the Company and Management (M), 
Supervision (S), Working Conditions (W), 
and Financial Rewards (IF). After a series of 
consultations and evaluations, 53 items were 
selected to constitute the survey instrument. 
The attitude items were of the Likert type 
(Likert, 1932) similar to the sample items 
shown in the various tables. For each item a 
five-point scale of satisfaction (S) ranging 
from high (+2) to low (—2) was followed by 





Item (reverse or negative scoring) 
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a five-point scale of importance (I) varying 
from a high of 5 to a low of 1 (Gillmer, 1961; 
Glennon et al., 1960). The seven negatively 
phrased items scattered throughout the ques- 
tionnaire were simply reversed in scoring on 
the satisfaction scale and a response on the 
left end of the scale was recorded as —2 for 
these items. The items were assigned to the 
following sections within the instrument: G— 
12 items, M—10 items, S—12 items, W—12 
items, and F—7 items. The following display 
is an illustration of the effect on Importance 
scale of attempt by respondent to compensate 
for negative item; the second is an illustra- 
tion of the effect of lower level satisfaction on 
the Importance scale: 


S-5 Some of the officers curse at you and “chew you out” in front of the rest of the crew. 


Example of usual positive response set (not intended response) 


Composite value = —10 


Example of same item response when respondent recognizes 


negative item and breaks set (intended response) 


Composite value = +4 


Item (positive scoring) 





Agreement 
2 =i 0 A +2 
[x] LI LI LI L] 
Importance 

5 4 3 2 1 
[x] i] ie ey Pe) 
Agreement 
=D) a 0 +1 +2 
L LJ LJ LJ 
Importance 

5 4 3 2 1 


hele gh sl hall i 


S-3 The officers aboard this ship have been fair in their dealings with me. 


Example of usual positive response set 


Composite value = +10 


Agreement 


+2 +1 0 —1 —2 
[x] fl L] LJ LJ 

5 4 3 Z 1 
REY Raid fc) dd. hh 
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Example of effect on Importance scale when agreement value 


is reduced by respondent 


Composite value = 0 


The subjects were members of the crews of 
ships operated by a Great Lakes shipping con- 
cern. Both officers (licensed personnel) and 
nonlicensed personnel were included. The me- 
dian ages for the various crews fell in the late 
30s or early 40s, and the median number of 
years of service for each ship was between 6 
and 10 years. All of the median education lev- 
els were below high school graduate. At least 
partially usable booklets were obtained from 
507 of a potential total of 512 (the total fig- 
ure varied slightly due to quits, transfers, hos- 
pitalization of ill crewmen, etc.). 


Factor ANALYSIS 


In order to investigate the underlying fac- 
tors present in the items of the five ration- 
ally designated areas of the questionnaire, a 
53 X 53 matrix of product-moment correla- 
tion coefficients was obtained,® using impor- 
tance (I) responses to maintain integrity with 
the format cited earlier. This matrix was fac- 
tored using a principal components solution 
(Harman, 1960), and 15 orthogonal factors 
were extracted. The point of inflection cri- 
terion indicated that a decision to rotate four 
factors would be most reasonable. 

Carroll (1957) has provided a biquartimin 
procedure for oblique rotation which was used 





8 The original correlation table has been deposited 
with the American Documentation Institute. Order 
Document No. 8220 from ADI Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress, Washington, D. C. 20540, Remit in advance 
$1.25 for microfilm or $1.25 for photocopies and 
make checks payable to: Chief, Photoduplication 
Service, Library of Congress. 

4 The original factor loadings have been deposited 
with the American Documentation Institute. Order 
Document No. 8220 from ADI Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress, Washington, D. C. 20540. Remit in advance 
$1.25 for microfilm or $1.25 for photocopies and 
make checks payable to: Chief, Photoduplication 
Service, Library of Congress. 
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Agreement 


22 oat 0) =) ed 
al al Xx 


Importance 








5 4 3 2 1 
| OU hE ee 


to obtain the factor loadings (Table 1) for 
the 53 items. After rotation the four fac- 
tors accounted for 46.4%, 28%, 13.3%, and 
12.5%, respectively, of the common variance. 
Although the method used provides an ob- 
lique rotation, it represents a compromise be- 
tween oblique and orthogonal solutions with 
a tendency to keep correlations between fac- 
tors low. 

Items loading .35 or higher were considered 
to be primary sources of description of the 
factors, while those loading between .20 and 
.35 were considered as constituting secondary 
sources. The items loading .35 or higher 
on the first or second factors are shown in 
Tables 2 and 3 while items loading .30 or 
higher are shown for the third or fourth fac- 
tors in Tables 4 and 5. 

The four factors were named after inde- 
pendent evaluation by several judges. There 
was almost complete agreement among these 
judges as to the factor names except for the 
third factor. 

The content of the items loading .35 or 
higher on Factor I does not seem to argue 
against naming this factor as a General Fac- 
tor of Favorable Attitude (Dabas, 1958; 
Wherry, 1958). One could question the gen- 
erality of its meaning, however, since the 
items loading most highly on it are those of 
the subscale designated as measuring attitude 
toward company and management. (The two 
high-loading items from the Financial Re- 
wards section also contain the word, com- 
pany.) Cureton and Sargent (1960) supplied 
evidence of a similar nature in their factor 
analysis and reanalysis of job satisfaction and 
morale data when they found a large general 
factor, one they called attitude toward or- 
ganization. There is some indication also of a 
security dimension in this factor (O’Connor & 
Kinnane, 1961) since a number of items car- 

















ASSIGNMENT OF ITEMS TO SUBSCALES 


rying high loadings on the factor can certainly 
be interpreted as security-oriented statements. 
For example, although the fleet was operating 
on a greatly reduced basis with approximately 
one-half the total number of ships laid up 
(in storage), the item loading most highly on 
Factor I is a positive statement of a bright 
future for the company. While no factor ap- 
pears which can be specifically labeled as a 
confidence in management factor (Wherry, 
1958; Whitlock, 1960; Whitlock & Cureton, 
1960), it is probable that the items which 
were judged to be related to confidence in 
management when the instrument was con- 
structed, were, in fact, more general in na- 
ture. Management people are physically re- 
mote from the employees of the sailing fleet. 
The average sailor very seldom sees a mem- 
ber of high-level supervision or management. 
Under such circumstances the infusion of 
General Morale and satisfaction into the Com- 
pany and Management area seems reasonable, 
i.e., in the absence of knowledge, responses 
tend to be based on general attitudinal tone. 

Factor II can be described as a factor of 
Favorable Attitude Toward Supervision. Of 
11 items which load .35 or higher on this 
factor, 10 mention the word supervisor or 
officer and the eleventh item includes the su- 
pervisor as part of the “good bunch of guys” 
belonging to the crew. The Wherry (1958) 
invariate appears quite clearly here and as 
priorly noted, several authors have provided 
evidence suggesting that the supervision fac- 
tor is an important dimension in morale and 
satisfaction measurement. 

Factor III is a difficult factor to explain 
and would have been even more difficult with- 
out considerable familiarity with the data. 
The judges asked to evaluate the factor con- 
tent were not familiar with these data, and 
when judging the factors tended to designate 
Factor IIT as a negative attitude toward su- 
pervision factor. It is true that the items 
which load .30 or higher on this factor are 
items comprising content which could be so 
evaluated. However, the items so loading are 
also items which were scored in reverse, since 
they were those judged to be stated in a nega- 
tive manner (i.e., a response at the left end 
of the scale was considered a negative rather 
than a positive response in these cases). While 
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Factor LoApINGS OF ITEMS AFTER OBLIQUE 
Roration oF Four Factors 






































Factor | Factor | Factor | Factor | Commu- 

Item | 2 3 4 nalities 
1 1441 3051 .1059 .0746 1304 
oy .5964 | —.0505 | —.0665 .0778 .3687 
3 .0467 .1298 .0320 .2160 .0667 
4 .4420 .0397 .0212 .0171 .1977 
5 4300 .0531 |} —.0410 ne .2052 
6 .1466 2435 1013 .1597 .1166 
7 2582 1161 .0126| —.0064 .0804- 
8 .1189 |} —.0708 3114} —.0220 .1166 
9 5804 .0050 | —.0922 SLOSS 3561 
10 4326 .0782 | —.0188 | —.0590 .1971 
Te 0581 3736 0365 1263 .1602 
12 .0717 .0492 3542 0564. .1362 
13 .5237| —.0294} —.0143 0851 .2826 
14. 4552 .0367 1101 .0584 2241 
15 .6175 | —.0818 .0498 1035 .4012 
16 .6156 | —.0290 .1887 | —.0008 4154 
17 5451 .0708 | —.0161 | —.1091 3143 
18 4901 .0316 255 .0229 DIS 
19 .5735 | —.0231 .0297 | —.0672 3348 
20 .2859 .1863 .2122 | —.0161 .1617 
21 .6256| —.0664 | —.0669 0191 4006 
22 4862 0572 BLOOM .0963 R2DoS 
28 1835 4060 .2313 | —.1494 2743 
24 1873 4517 .2020} —.0150 2801 
25 | —.0423 .5671 .1067 | —.0775 3408 
26 | —.0565 0415 .3169 | —.0687 .1101 
27 | —.0813 2314 4487 .0632 .2655 
28 .0495 .1298 4494 .0983 .2309 
29 .0028 .4157 .0579 .1835 .2098 
30 .1686 4287 .1023 .0658 .2270 
31 0081 4941 .0317 .0577 2485 
32 SOD .4892 1440 .0630 .2823 
33 .0840 5216 .0170 .1104 .2916 
34 1243 .3587 | —.0241 .0481 .1470 
So 1783 .3100 -0300 DT .1762 
36 .0424 .1114} —.0500 oyu?) 3205 
37 1420 .2024| —.0038 3063 .1550 
38 .1410 .0890 | —.0442 .5399 .3212 
39 | —.0178 .1086 | —.0113 A901 2524 
40 3157 .2542 .1380 | —.0500 .1858 
41 0494 .3335 | —.0005 DOL .1787 
42 | —.0346 .0966 .4355 | —.1049 e212 
43 .1240 3197 .0187 USL pS Su 
44 | —.1349 .0811 3867 .0595 1778 
45 .0885 3952 .1908 .1912 2370 
46 .2198 .2271 | —.0196 .0313 .1012 
47 4253 .0387 | —.1278 | —.1292 2154 
48 3766 .0907 0294. .0536 1538 
49 .2191 .1963 | —.0779 | —.0031 .0926 
50 3856 .0675 .0812 .0218 1603 
51 .1079 .2398 .0491 RULOO 0845 
52 3110 .1266| —.1681 | —.0603 1446 
53 .2933 .1513 | —.0341 0134 s08 
Total 11.5126 
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TABLE 2 
Irems Loapinc .35 oR HIGHER ON Factor I 





Factor loadings 








Item he 
I Il Tit IV 

M 9 The ore hauling business and (deleted) have a bright future. 62 — .06 — .06 O01 
M 3 Management does a good job of letting us know their plans 

for the future. 61 —.08 04 .10 
M 4 Top management understands our problems and is sym- 

pathetic to them. 61 —.02 18 — .00 
G2 Sailing with this company offers a good future for a young 

man. 9 —.05 — .06 07 
G9 Sailing an ore boat offers security and a good future. 08 00 —.09 .10 
M 7 I have confidence in the business judgment of top manage- 

ment. EO —.02 02 —.06 
M_ 5 This company treats its employees better than any other 

on the Great Lakes. 54 07 —.01 —.10 
M 1 I have confidence in the fairness and honesty of the men in 

the Cleveland office. ay — .02 = il .08 
M 6 The company makesa real effort to keep its crews informed 

about sailing times and time off in port. 49 .03 12 02 
M 10 Management makes us feel that our jobs are important to 

the company’s success. ABS .05 13 .09 
M 2 Management is always willing to listen to my complaints 

or suggestions. A5 .03 mal .05 
G 4 If you do your best, you know the company will try to find 

a job for you next year. 44 - 03 02 O1 
G_ 5 If you do good work, you can get ahead in this company. 43 05 — .04 aL 
G 10 I would rather work for our company than for any other 

operating on the lakes. 43 07 —.01 —.05 
F 1 This company pays as well as any company on the lakes. 42 03 —.12 —.12 
F 4 Iam paid better sailing than I would be ashore. 38 06 08 .02 
F 2 With the company pension plan we can retire comfortably. id 09 02 .05 








TABLE 3 
Items Loapinc .35 or HIGHER ON Facror II 








Factor loadings 





Item 
I II Ii IV 

S 3 The officers aboard this ship have been fair in their deal- 

ings with me. — .04 .56 10 —.07 
S 11 My supervisor is helpful in training new men. .08 a 01 11 
S 9 My supervisor lays out my work and lets me do it without 

interfering. .00 A9 03 05 
S 10 Our officers do a good job of handling men. mG 48 14 .06 
S 2 If Ihave a problem or complaint, I feel free to go to my 

supervisor. 18 45 .20 —.01 
S 8 Mysupervisor is interested in my success and advancement. .16 42 .10 .06 
S 7 Our officers do their best to enforce the safety rules and : 

regulations. .00 Al .05 18 
S 1 Our officers follow the new booklet on Personnel Policies 

for Nonlicensed Employees right on down the line. 18 40 23 —.14 
W 11 Our officers get along well with each other and with the crew. 08 39 19 19 
G 11 The members of our crew are a pretty good bunch of guys. 05 Bl 03 ale 


S 12 My supervisor lets me know “where I stand.”’ 12 safe) —,0Z 04 
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TABLE 4 


Irems LoApING .30 oR HIGHER ON Factor IIT 








Item 





S 5 Some of the officers curse at you and “chew you out” in 


front of the rest of the crew. 


S 6 Some of the officers “‘play favorites” in their handling of the 


men. 


W_ 8 Thenew booklet says no unnecessary work on Sundays or 
holidays but we are sometimes ordered to do such work. 
W 10 Thespecific duties of nonlicensed personnel should be more 


clearly spelled out. 


G 12 The attitudes of a few licensed personnel are pushing men 


toward a union. 


G_ 8 There is poor communication between the average non- 


licensed man and the company. 


S 4 When my immediate supervisor lays out work for me another 


officer may change the order. 


Factor loadings 





il II Til IV 
— .08 eS 44 .06 
04 pl 44 .09 
ne 09 43 =a 
als 08 38 .0S 
07 04 oo .05 
tlt —.07 “31 —.02 
—05 04 sh 00 














it would not be surprising to find a bipolar 
factor with some items reflecting positive atti- 
tudes toward supervision and others reflecting 
negative attitudes, it appears unlikely that 
only those items phrased and scored nega- 
tively would constitute and define such a sepa- 
rate factor, although this is not impossible. 

It should also be kept in mind that the 
negatively phrased items do not load ap- 
preciably on the general factor. While no at- 
tempt has been made to make a pattern analy- 
sis, a cursory examination of the item re- 
sponses for individuals, as spread out upon 
work sheets, suggests that a great many re- 
spondents did not perceive that they were 
marking a negative response to the items but 
were continuing a series of positive responses. 
It seems, therefore, a compulsory conclusion 


that Factor III is a Response Artifact factor. 

The fourth factor has only four items load- 
ing .30 or higher on it, and three items load- 
ing between .20 and .29. This Working Con- 
ditions factor appears to be another example 
of the Wherry (1958) invariates. 


AGREEMENT OF RATIONAL ITEM SorT AND 
Item FACTORIAL CONTENT 


How successful was the attempt to assign 
items to areas or subscales of the question- 
naire on the basis of rational judgments of 
content using the Wherry invariates as refer- 
ence points? If one is willing to permit the 
combination of the first two areas designated 
General Morale and Company and Manage- 
ment, the judged area and factor content clas- 
sification was similar for approximately three- 


TABLE[5 
Items Loapinc .30 or HIGHER ON Factor IV 











Factor loadings 








expected. 


aboard ship. 


4a a2 <2 


5 The number of toilets and showers per man is adequate. 
3 I’ve got no gripes about the food or the way it’s served. 14 .20 





2 The living quarters on this ship are as good as can be 


4 The recreational facilities are as good as can be expected 


I II Tit IV 
04 at —.05 0 
14 08 — .04 53 
— 01 .10 — 01 49 


—.00 30 
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TABLE 6 


Tren Items witH HiGHEST SATISFACTION MEANS 








1. | F 7 | The company’s Group Insurance Plan 
will take good care of me and my 
family if I ever need it. 

This company pays as well as any com- 
pany on the lakes. 

We get paid for overtime when we have 
it coming. 

Tf you do your best, you know the com- 
pany will try to find a job for you 
next year. 

The members of our crew are a pretty 
good bunch of guys. 

I like the new booklet on Personnel 
Policies for nonlicensed Employees. 

The amount of work we are expected to 
do is fair and reasonable. 

I have confidence in the fairness and 
honesty of the men in the (deleted) 
office. 

The officers aboard this ship have been 
fair in their dealings with me. 

Our officers do their best to enforce the 
safety rules and regulations. 


8.5|/M 1 


Si] Ome 
WOES a 











TABLE 7 


TEN ItEMS witH LOWEST SATISFACTION MEANS 








1.| M 61] The company makes a real effort to 
keep its crews informed about sailing 
times and time off in port. 

Sailing an ore boat offers security and a 
good future. 

The ore hauling business and (deleted) 
have a bright future. 

Management does a good job of letting 
us know their plans for the future. 
The new booklet says no unnecessary 
work on Sundays or holidays but we 
are sometimes ordered to do such 

work. 

There is poor communication between 
the average nonlicensed man and the 
company. 

When my immediate supervisor lays 
out work for me another officer may 
change the order. 

Some of the officers ‘‘play favorites” in 
their handling of the men. 

-The attitudes of a few licensed per- 
sonnel are pushing men toward a 
union. 

The specific duties of nonlicened per- 
sonnel should be more clearly spelled 
out, 


Sania 


10.| W 10 
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quarters of the items in these areas. Nine of 
the 12 items judged related to satisfaction 
with supervision had highest loadings on the 
Supervision factor, Factor II. Again, com- 
mon classifications result for approximately 
three-quarters of the items. 

The unfortunate and unforeseen distortion 
discussed earlier precludes any meaningful in- 
terpretation of Factor III as it related to the 
rational versus factorial content. 

The judged relationships do not hold up as 
well for the items adjudged related to the 
Working Conditions area. Only 4 of the 12 
items have highest loadings on the Working 
Conditions factor, Factor IV. In addition, the 
agreement of the Financial Rewards items 


TABLE 8 


RuO VALUES COMPARING ITEM RANKS AS A 
FUNCTION OF INDEX EMPLOYED 








Xg Xg x1 Sg Sg x 1? Sg x XB 








Xs 998 —.906 —.711 .893 
Xs x1 —.895 —.716 897 
sg 863 —.675 
sg Salen — .509 
ss X Xs 

a Based upon x = 91. All others based upon complete 


sample. 


with expectation is at a minimum. None of the 
financial items loads in such a manner as to 
define a factor. It is interesting to note that 
the generalization effect attributed to General 
Morale and Attitude Toward Company and 
Management apparently. exists in regard to 
the Financial Rewards attitudes. Six of the 7 
F items have highest loadings on the General 
Favorable Response factor, Factor I. 


EVALUATION OF INDICES 


The satisfaction scale (S) was used to ob- 
tain item rankings based upon conventional 
responses of agreement. Such rankings served 
as reference levels with which to compare ad- 
ditional indices. A composite score (S X I) 
was obtained for each item by multiplying 
the Importance-scale value (1) by the Satis- 
faction-scale value (S) for each respondent, 
and items were ranked on the basis of the 
composite values. Items were also ranked by 


size of satisfaction variance (s*s) and by 
composite variance (s*gy1). A fifth index 
(ss¥s) was obtained by multiplying each 
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item standard deviation (ss) by the mean 
satisfaction score (Ys) for that item. The 10 
items with the high Yg values (and the 
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10 with the low Xg values) are shown in 
Tables 6 and 7. Rho values comparing item 
ranks as a function of index employed are 
shown in Table 8. For comparative purposes, 
Table 9 provides the item-index values (and 
item standard deviations where appropriate) 
item by item for the various indices. 

Items ranked as the high 10 and the low 10 
using the various indices are identified by area 
and number and compared in a scalar repre- 
sentation in Table 10. Inspection of the rank- 
ings shown indicates that of the 20 items 
identified by mean satisfaction scores, 4 are 
identified by all other indices, 11 items are 
common to all except one index, and 5 items 
are common to three of the rankings. None of 
the 20 items is exclusive to the high or low 


TABLE 10 


SCALAR REPRESENTATION OF 10 ITEMS RANKED HIGH 
AND Low BY THE FIvE INDICES 
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10 items identified by satisfaction (Xg) alone. 
The single item unique to one ranking is item 
W-2 which appears in the rankings based 
upon composite variance (5g x 1). 

Of the 7 items judged to be negative and 
scored in reverse (G-8, G-12, S-4, S-6, W-8, 
and W-10), 6 are ranked in the critical 20 by 
a majority of all indices. The seventh, S-5, 
ranks highest on satisfaction variance and 
seventh high on composite variance. It is ap- 
parent that the item mean decreases propor- 
tionately as the magnitude of the variance in- 
creases in almost all cases with a rank-order 
correlation value of .906 for the rankings 
based upon mean satisfaction value versus 
satisfaction variance. 

The results of the comparison of item ranks 
based upon the various indices suggest con- 
siderable communality in item rankings. The 
item rankings based upon composite score 
variance show the least correlation with rank- 
ings based upon other indices. An explanation 
can be advanced that the respondents who 
recognized the negative items reacted by mov- 
ing their response checks to the right in order 
to produce a favorable response to the satis- 
faction scale for that item (Figure 1). When 
so doing, these respondents tended to mark a 
somewhat lower level of importance by re- 
sponding to the importance scale at or near 
the point where they responded to the satis- 
faction scale, i.e., near the right hand end of 
the scale. Since importance scale values did 
not reverse, the effect was to lower the scale 
value of the importance responses. A similar 
effect appeared to occur when the individual 
respondent indicated less satisfaction with 
items scored in the normal direction. That is, 
there appeared to be a response set leading to 
reduction in the importance response-scale 
value (Figure 2). Whatever the reasons, the 
results indicate that items with the lowest 
mean satisfaction scores tend to have the 
highest variances probably because higher 
variance is primarily a function of departure 
from the norm of high-satisfaction responses. 


DISCUSSION 


It appears that the more specific an item is 
regarding the content of an invariate area, 
and the more identifiable the underlying di- 
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mension becomes for the respondent—and, 
parenthetically, for the judges—the more ac- 
curately judges can assign items to the sub- 
scales of a questionnaire. For a less homo- 
geneous group, and a group less favorably in- 
clined toward the organization, even more 
concordant results would probably have been 
obtained in the comparison of factor content 
with judged item content. 

The inclusion of an importance scale in con- 
junction with a satisfaction scale, item by 
item, fails to make the expected contribution, 
at least in this study. This may well be be- 
cause of a general “response halo” which 
causes the subject to link his satisfaction and 
importance responses so closely that they pro- 
vide little differential information. A remedy 
for this could be to print the item twice in 
different locations within the booklet, once at- 
taching an importance scale and once a satis- 
faction scale. With such separation, quantifi- 
cation of the importance dimension might be- 
come meaningful. Rosen and Rosen (1955) 
have suggested similar procedures. 

One definite indication from this study is 
that caution must be used in interjecting nega- 
tive items into questionnaires—at least for 
people at lower educational levels who do not 
have much occasion to read. Since it seems 
true that some items are difficult to phrase 
positively, it might be advantageous to cluster 
negatively worded items together under a 
separate instruction which would encourage 
the respondent to shift “sets.” 

The results of item-index comparisons in- 
vestigated here can best be summarized by 
stating that little evidence was found to sup- 
port the inclusion or calculation of additional 
item indices of the type employed in this 
study. 
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To clarify the concept of task versus person orientation in nursing, a factor 
analysis of 24 personality- and nursing-attitude variables was performed on 
160 nurses from long- and short-term treatment settings in psychiatry and 
general medicine. 3 factors, Leadership Skills, Hostile-Self-Seeking, and 
Dependent-Exploited sampled behaviors in the interpersonal sphere and 
particularly traits related to leadership capability. The 4th factor, Impersonal- 
Orderly, contained many characteristics of the “Authoritarian Personality.” 
An emphasis on the skilled technical aspects of nursing was one of the elements 
in this factor. In general, the derived factor scores effectively differentiated 


nurses from the treatment settings in this study. 


As a result of the continuing emphasis on 
specialization in the field of medicine, the 
nursing profession today offers its members 
wide latitude in the selection of a setting in 
which to practice their profession. These 
settings may be categorized in terms of 
the demands made on skills related to patient 
treatment and care and on skills in the sphere 
of interpersonal relationships. Operating room 
nurses, for example, are highly skilled tech- 
nicians but have little personal contact with 
patients. On the other hand, psychiatric 
nurses in state mental hospitals often have 
intensive and sustained contact with patients 
but make little use of their training in the 
treatment and care of physical illness. Conse- 
quently, from a personality standpoint the 
nursing profession should be able to accept 
into its fold both the impersonal, orderly 
individual with a primary interest in the 
skilled technical aspects of nursing and the 
sociable, outgoing individual who tends to be 
more person oriented than task oriented. 

A comparative study of neuropsychiatric 
(NP) and general medical surgical (GMS) 
nurses by Navran and Stauffacher (1958) 
supports the premise that distinctive personal- 

1 This study was supported by funds made avail- 
able by Research Grant MHO 4663-03 of the Psycho- 


pharmacology Service Center, National Institutes of 
Health, United States Public Health Service. 
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ity attributes characterize nurses working in | 
different treatment settings. They found GMS 
nurses are significantly more orderly, deferent, 
self-abasing, and less. introspective and ag- 
gressive than NP nurses. The interpretation 
given these results was that the GMS nurses 
are more work oriented, impersonal and less 
able to direct others than their psychiatric 
colleagues who placed greater emphasis on 
interpersonal relationships. In a_ separate 
study, Navran and Stauffacher (1957) report 
“best” psychiatric nurses were rated by their 
supervisors as relatively less timid, more 
warm, stable, and capable of leadership, than 
less highly rated nurses. Further, Gynther and 
Gertz (1962) found student nurses, rated 
as “good” by their instructors, had a sig- 
nificantly greater need for dominance and 
significantly less interest in orderliness than 
“poor” student nurses. Finally, in a related 
study by Meyer (1958), senior nursing stu- 
dents with a primary interest in the skilled- 
technical aspects of nursing, expressed less 
satisfaction with nursing as a career than 
senior students who believed the nurse-patient 
relationship was the single most important 
aspect of nursing. 

These findings indicate that the concept 
of task versus person orientation has sig- 
nificance beyond its descriptive value in cate- 
gorizing the demands on a nurse’s technical 
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and interpersonal skills in various treatment 
settings. First, distinctive personality and at- 
titudinal variables seem to characterize nurses 
from task- and person-oriented treatment 
settings. Secondly, there seems to be a'general 
belief among nurses that person orientation, 
or more specifically person orientation in 
combination with leadership capability, char- 
acterizes the good nurse, whereas a belief in 
order and control is more characteristic of 
the poor nurse. Thirdly, there is some evi- 
dence the concept is related to career satis- 
faction. In the present study an effort was 
made to identify the dimensions of this 
concept. Specifically, a factor analysis of 
personality- and nursing-attitude variables 
related to task and person orientation was 
performed on 160 nurses from long- and 
short-term treatment settings in psychiatry 
and general medicine, respectively. Variables 
related to career and job satisfaction were 
also included in the analysis. As a partial test 
of the meaningfulness and utility of the factor 
analysis results, nurses from the five nursing 
specialities in this study were compared on 
certain background variables and on the 
factors that emerged from the analysis. 


METHOD 
Subjects 


The subjects (Ss) were 160 registered nurses. 
Psychiatric units in receiving hospitals? and state 
mental hospitals? represented the short-term and 
long-term settings in psychiatry, respectively. The 
operating room (OR)* represented short-term GMS 
settings. As medical and surgical treatment of tu- 
berculosis (TB)®5 and orthopedic patients ® involves 
prolonged care and long contact, the original plan 


2 Crownsville State Hospital, Crownsville, Mary- 
land; Malcolm-Bliss Mental Health Center, St. Louis, 
Missouri; Spring Grove Hospital, Crownsville, 
Maryland; Springfield Hospital, Springfield, Mary- 
land. 

8 Crownsville State Hospital, Crownsville, Mary- 
land; Eastern Shore State Hospital, Cambridge, 
Maryland; Kentucky State Hospital, Danville, Ken- 
tucky; Saint Elizabeth’s Hospital, Washington, D. C. 

4 George Washington Hospital, Washington, D. C.; 
Georgetown Hospital, Washington, D. C.; Providence 
Hospital, Washington, D. C.; Washington Hospital 
Center, Washington, D. C. 

® Glenn Dale Hospital, Glenn Dale, Maryland. 

6 National Orthopedic and Rehabilitation Hospital, 
Arlington, Virginia; North Carolina Orthopedic Hos- 
pital, Gastonia, North Carolina; Suburban Hospital, 
Arlington, Virginia. 
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was to combine nurses from these specialties to 
represent the long-term GMS setting. However, tests 
revealed TB nurses were significantly older and 
received a higher average annual income than nurses 
from orthopedic settings. Consequently, the ortho- 
pedic and TB groups were not combined in testing 
for significant differences among the various nurse 
specialties. 


Measures 


Background Variables. (a) age, (b) education: 
five-point scale of level of schooling completed; 
(c) earnings: five-point scale of present annual 
salary; (d) intent to remain: five-point scale of 
months intend to remain in present position; 
(e) length of employment: months in present po- 
sition; (f) contentment with present position: five- 
point scale of degree of contentment; (g) present 
position: staff nurses received a rating of one, 
supervisory nurses a rating of two. 

Personality- and Nursing-Attitude Variables. (a) 
Sociability: 15 true-false items from the Socia- 
bility Scale on the Guilford-Zimmerman Tempera- 
ment Survey. (b) Democratic Attitude: 25 items 
adapted from the Adorno-Levinson F Scale (Adorno, 
Frenkel-Brunswick, Levinson & Sanford, 1950). 
(c) Subsurvience to MD: 4 items focusing on the 
relative authority of doctors and nurses in running 
a ward and in making decisions concerning patient 
care. (d) Favoring the Elimination of Status Dis- 
tinctions in the Hospital: 5 items concerned with 
class and status distinctions among doctors, nursing 
personnel, and patients. (e) Belief in Maintaining 
Social Distance from Patient: 6 items which empha- 
size the pitfalls of getting friendly with patients. 
(f) Emphasis on Technical Skills in Nursing: 9 items 
emphasizing either the importance of general medical 
and surgical skills or the nurse’s personality in the 
care of physical and mental illness. (g) Belief in 
efficacy of Ward Personnel: 4 items which suggest 
the patient’s contacts with personnel on the ward 
is often of therapeutic value, particularly in the 
treatment of mental illness. (#) Interest in Patient 
as a Person: 5 items which reflect staff interest in 
talking to patients and getting to know them as indi- 
viduals. (7) Belief in Order and Control: 4 items 
which emphasize the need for proscriptions, limita- 
tions, and controls in running a ward. Items com- 
prising Variables 6 through z were randomly inter- 
spersed as Part II of the questionnaire booklet. 
These variables were scored on four-point, bipolar 
scales with a score of one indicating strong agree- 
ment and a score of four strong disagreement. Vari- 
ables c through z were adapted from a _ nursing- 
attitude survey form distributed by the Psycho- 
pharmacology Service Center of the National Insti- 
tute of Mental Health.’ (j) Autocratic: 7 items, e.g., 


7 The items comprising the nursing attitude vari- 
ables and the items included in the positive and 
negative personality variables adapted from Leary’s 
Interpersonal Checklist have been deposited ~ as 
Tables A, B, and C with the American Documenta- 
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TABLE 1 


VARIABLE LOADINGS ON THE 
Major FActors 











Variable A B ic D 
Managerial -—77 —.09 .05 —.12 
Responsible —.69 -0S Qe 19 
Self-Reliant —.66 —.13 —.17 a 
Forthright —.63 -06 —.27 —.00 
Self-Ideal Nurse Agreement —.63 -O1 at —.28 
Cooperative —.56 -02 36 225 
Sociability =.50 .06 .09 .O5 
Belief in Order and Con- 

trol -O04 —.81 12 -08 
Emphasis on Technical 

Skills —.03 —.69 .00 -O1 
Belief in Social Distance 

from Patients -00 —.55 —.14 —.01 
Democratic Attitude -09 42 -16 —.32 
Aggressive-Blunt 10 -OS —.73 —.03 
Autocratic —.10 —.04 —.69 .05 
Distrustful Zt —.00 —.64 -08 
Exploitive -02 —.08 —52 —.08 
Overconventional 16 .02 -02 sal 
Overgenerous —.06 —.06 —.05 74 
Docile-Dependent s23 —.08 —.05 8 
Accepting —.21 —.12 YP 45 
“Bossy,” “Always giving advice,’ and “Expects 


everyone to admire him.” (k) Distrustful: 12 items, 
e.g., “Resents being bossed,” “Touchy, easily hurt,” 
and ‘Distrusts everyone.” (/) Overconventional: 9 
items, e.g., “Wants everyone to like him,” “Agrees 
with everyone,’ and “Fond of everyone.” (m) Ex- 
ploitive: 8 items, eg., “Thinks only of himself,” 
“Cold and unfeeling,” and “Egotistical and con- 
ceited.” (n) Self-Effacing: 13 items, eg., “Usually 
gives in,” “Meek,” and “Apologetic.” (0) Aggressive- 
Blunt: 10 items, eg., “Critical of others,” “Impa- 
tient of other’s mistakes,” and “Cruel and unkind.” 
(p) Docile-Dependent: 12 items, e.g., “Very respect- 
ful to authority,” “Clinging Vine,” and “Will believe 
anyone.” (qg) Overgenerous: 7 items, e.g., “Over- 
protective of others,” “Forgives everything,” and 
“Tries to comfort everyone.” (r) Managerial: 8 
items, e.g., “Able to give orders,” “Well thought of,” 
and “Respected by others.” (s) Cooperative: 7 items, 
eg. “Cooperative,” “Friendly,” and “Warm.” 
(t) Self-Reliant: 7 items, eg., “Able to take care 
of self,” “Independent,” and “Self-reliant and asser- 
tive.” (%) Forthright: 6 items, eg., “Can be frank 
and honest,” “Firm but just,” and “Stern but fair,” 
(v) Accepting: 4 items, e.g., “Grateful,” and “Accepts 
advice readily.” (w) Responsible: 7 items, e.g., 
“Helpful,” and “Gives freely of self.’ The true-false 
items comprising Variables 7 through w were adapted 
from the Leary Interpersonal Checklist (Laforge & 
Suczek, 1955). (x) Self-Ideal Nurse Agreement: the 
128 item Leary Interpersonal Checklist was rated 
for the “Self” and again for the “Ideal Nurse.” Phi- 
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coefficients were computed as a measure of agreement 
between these ratings. 

Control Variable. Social Desirability: the 33 item 
scale developed by Crowne and Marlow (1960) to 
measure the tendency to give socially desirable 
responses on self-administered personality inventories. 


Procedure 


The study variables were incorporated into a five- 
part, self-administered, questionnaire in booklet form. 
The forms were mailed to the Director of Nursing 
of each hospital and distributed to the staff nurses, 
accordingly. To assure anonymity the booklet listed 
a code number to identify the S and setting with a 
request to omit the S’s name. To further avoid 
tendencies to give socially desirable responses, the S 
was instructed to mail the forms directly to the 
authors upon completion instead of returning them 
to the Director of Nursing. 

The 24 personality- and nursing-attitude variables 
in the factor analysis were intercorrelated by the 
Pearson product-moment method. The correlation 
matrix was factor analyzed using Hotelling’s prin- 
cipal components method with unity in the diagonals. 
A normal varimax rotation, which provides an 
orthogonal solution, was then performed on all 
factors with eigenvalues of one or greater. 

Analyses of variance and chi-square tests were 
computed to test for significant differences among 
the five nursing groups on the background variables. 


-Covariance analyses, controlling for the set to give 


socially desirable responses, were computed to test 
for significant differences among the five treatment 
groups on the derived factor scores. Kramer’s 
(1956) extension of the multiple-range test to group 
means with unequal numbers of replications was 
employed on variables where there was a significant 
overall difference between groups to determine where 
the groups differed. 


RESULTS 
Factor Analysis 


Four factors containing at least three sig- 
nificant variables were extracted from the 
factor analysis. Variable significance was 
arbitrarily defined as» a loading of .40 or 
higher on one factor and no loading of .40 
or higher on any other factor. 

Factor A (see Table 1) contains the largest 
number of significant variables and also ac- 
counts for the greatest percentage of total 
variance. This factor, labeled Leadership 
Skills, includes variables indicating facility 
in establishing and maintaining good inter- 
personal relationships (Sociability and Co- 
operative), but the major emphasis is on 
variables related to leadership qualities, i.e., 
Managerial, Self-Reliant, Forthright and Re- 
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sponsible. A sampling of items in these latter 
categories includes: able to give orders, likes 
responsibility, firm but just, self-confident, 
self-reliant, and assertive. It is particularly 
noteworthy that Self-Ideal Nurse Agreement 
also has a significant loading on this factor. 
Hence, nurses who perceive in themselves 
many of the qualities attributed to the Ideal 
nurse also rate themselves high on the 
leadership variables. 

Factor B, Impersonal-Orderly, contains 
variables that closely resemble Adorno et al. 
(1950) description of the authoritarion per- 
sonality. Included in this factor are items 
indicating a tendency to be on the lookout for 
and to condemn, reject and punish people 
who violate conventional values, an emphasis 
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on the importance of rules and regulations in 
running a ward, a tendency to stress the 
importance of technical skills in nursing and 
the belief you have to keep your distance 
from patients or they will forget you are a 
staff member. 

Factor C, Hostile-Self-Seeking, character- 
izes the individual who uses people for her 
own ends, is bossy, autocratic, egotistical, crit- 
ical of others, impatient of other’s mistakes, 
and generally rebellious. 

In opposition to Factor C, Factor D, 
Dependent-Exploited, describes the individual 
who wants everyone to like her, is easily 
fooled, lets others make decisions, spoils 
people with kindness, and is generally trusting 
and eager to please. 


TABLE 2 


MEAN Scores, STANDARD DEVIATIONS AND F Ratios OF DIFFERENCES AMONG 
THE Five GRouPS ON THE STUDY VARIABLES 




















General medical surgical Neuropsychiatry 
Variable Ortho. TB OR State Receiving BF 

Number (J) 14 26 40 40 40 

Supervisory nurses (%) 50 46 20 68 0) 18, 98** 

Year of birth M 29 13 29 26 31 fT ieoulies 
SD ee 8.8 11.0 9.1 8.0 

Level of education M 2.9 Dis Dai 2.9 2.9 58 
SD 1.38 95 1.09 ilo 1.00 

Annual earnings M 2 4.0 2.8 3.4 3.4 11.49** 
SD .65 1.06 74 86 81 

Intent to remain in present job M 4.2 4.8 309 4.5 4.3 os 
SD 97 A7 1.45 91 1.20 

Time in present job (months) Mdn. 23.0 38.8 15.5 13.0 18.0 2.67% 

Contentment with job M 1.9 Die 2.0 ZED Dee, 67 
Ly, ais 94 81 Hi 10 

Leadership Skills (Factor A) M 35.2 33.7 Been 36.7 36.6 O.2ie* 
SD 3.10 4.91 5.67 3.83 3.21 

Impersonal-Orderly (Factor B) M 11.4 10.0 10.6 9.6 9.7 1.35 
SD 3.63 2.71 2.01 2.90 3.49 

Hostile-Self-Seeking (Factor C) M 18.5 19.0 20.1 20.5 TiS 3.91** 
SD 1.72 2.07 S22 3.65 2.70 

Dependent-Exploited (Factor D) M 20.2 20.3 21.0 18.9 19.8 3.28* 
SD 2.68 3.25 2.48 Ded 2.61 

Subservience to MD M LED 11.2 12.6 10.9 eH 6.00** 
SD 2.46 1.75 1.46 1.58 1.42 

Favoring Elimination of Status M 15.9 16.0 16.2 Ls Lies 4.11** 

Distinctions SD 1.73 2.52 1.96 1.92 2.50 

Belief in Efficacy of Ward Personnel MZ 10.9 12.0 12.3 13.0 12.9 ay Or 
SD 1.70 1.79 1.48 1.53 1.69 

Interest in patient as person M 12.6 12.9 12.9 fen 12.8 eri 
SD 1.95 2.29 1.67 1.55 1.88 


Note.—TB = tuberculosis, Ortho. = orthopedic; OR = operating room; Receiv. = receiving. 
ay, 


* » < .05, two-tailed. 
** » < .01, two-tailed. 
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From the preceding discussion, Factors A, 
C, and D would appear to have face validity 
as predictors of a nurse’s ability to assume 
a leadership or supervisory role. A  super- 
visory nurse should be able to assume re- 
sponsibility and direct others without appear- 
ing bossy or autocratic and should not feel 
that everyone has to like her or that she has 
to spoil others with kindness. As a test of 
these assumptions, ¢ tests were computed be- 
tween supervisory (V = 73) and staff nurses 
(N = 84) on these three factors. As predicted, 
supervisory nurses were significantly higher 
on Leadership Skills (¢ = 1.65, p < .05) and 
significantly lower on Hostile-Self-Seeking 
(t= 2.09, p< .05) than staff nurses. The 
difference between supervisory and_ staff 
nurses on Dependent-Exploited was in the 
expected direction but was not significant. 
Although supervisory nurses were significantly 
older than staff nurses (¢ = 2.85, p< .01), 
the correlations between age and Factors A, 
C, and D were not significant. In other words, 
the differences between supervisory and staff 
nurses on Leadership Skills and Hostile-Self- 
Seeking are not merely a reflection of age 
difference. 


Characteristics of Nurses in the Five Treat- 
ment Settings 


There were 10 significant differences among 
the various nursing groups as compared to 
3 expected by chance for the number of tests 
made (see Table 2). State hospital and 
receiving hospital psychiatric nurses spon- 
sored greater equalitarianism in their rela- 
tionships with other hospital personnel than 
OR nurses and nurses from TB and ortho- 
pedic settings. Both NP nursing groups also 
indicated a stronger belief in the efficacy of 
ward personnel than orthopedic and TB 
nurses. Finally, NP nurses rated themselves 
higher on Leadership Skills than nurses from 
OR and TB settings. 

Operating room nurses have proportionately 
fewer members in supervisory roles, indicate 
least intention to remain in their present 
position and with the exception of orthopedic 
nurses, make less money than the other 
nursing groups. They do not differ significantly 
from the other groups in their ratings of 
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Contentment with Present Position and their 
median length of stay in their present position 
is 16 months. In addition, OR nurses rated 
themselves as more self-effacing and Depend- 
ent than psychiatric nurses in state mental 
hospitals and as favoring greater subservience 
to the MD than state hospital, receiving 
hospital and TB nurses. In sum, OR nurses 
present a striking picture of dependency. 
They feel inadequate in a leadership role, 
hold to the traditional belief that nurses 
should be subservient to the MD, perceive 
themselves as assuming a dependent role in 
their interpersonal relationships and have 
fewer members in supervisory positions as 
compared to nurses from the other treatment 
settings. By contrast, psychiatric nurses in 
state mental hospitals present themselves as 
the least dependent of the five groups. They 
are lowest on Factor D, are least in favor 
of subservience to the MD and are highest 
on Leadership Skills as compared to the 
other nursing groups in the study. In addi- 
tion they have the highest relative number of 
supervisory nurses. 

Nurses in TB and Orthopedic settings dif- 
fered from the two NP groups on Factor C. 
They rated themselves as less self-seeking, 
exploitative and aggressive than the NP 
nurses. Finally, TB nurses were older than 
nurses from the other groups including the 
orthopedic nurses. 


DiscussIoON 


To the nurses in the present study, the 
concept of person orientation connotes more 
than a desire to be with and a liking for 
people. Although these qualities are impor- 
tant, the major emphasis is on leadership 
skills, e.g., the ability to assume responsibil- 
ity, act independently and give orders. Fur- 
ther, nurses who feel inadequate in the role 
of a nurse, ie., had low Self-Ideal Nurse- 
Agreement scores, also rated themselves low 
on variables related to Leadership Skills. This 
latter finding is consistent with Raskin’s 
(1962) results with psychiatric outpatients 
where self-admired other agreement scores 
were significantly related to measures of 
self-satisfaction, sociability, and assertiveness. 
However, Contentment with Present Position 


ORIENTATION 


is not related to Leadership Skills. Presuma- 
bly other factors such as general working 
conditions and convenient location are more 
closely associated with satisfaction in a par- 
ticular job than one’s feeling of adequacy 
as a nurse, 

Task orientation or the tendency to empha- 
size the skilled-technical aspects of nursing 
appears to be an expression of a general in- 
ability to get close to people and a pervasive 
need for proscriptions, limitations, and con- 
trols in one’s social and personal relationships. 
Belief in the need to maintain status distinc- 
tions in the hospital and in the doctor-nurse 
relationship are not related to Factor B. 
Rather it is the nurse who rates herself as 
dependent and lacking the capacity for leader- 
ship who favors a subservient role in her 
relationship to MD and who believes in main- 
taining status distinctions in the hospital 
hierarchy. 

With one major exception, the factor scores 
did an effective job of differentiating nurses 
from the various treatment settings. Although 
NP nurses scored significantly higher than 
two of the three GMS nursing groups on 
Leadership Skills as predicted by the findings 
of Navran and Stauffacher (1958), there 
were also differences between OR nurses and 
the other GMS groups and between TB and 
orthopedic nurses. In other words, it is neces- 
sary to go beyond the NP, GMS distinction 
and probe for idiosyncratic characteristics 
within a particular nursing setting. 

Factor B did not differentiate the nursing 
specialties in this study. However, this factor 
may prove useful in other contexts. For ex- 
ample, in the field of psychiatric nursing, 
particularly in work with drug addicts and 
alcoholics, a tolerant attitude toward these 
forms of socially deviant behavior is very 
important. Consequently, this factor may be 
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useful in selecting nurses for work with these 
patients. 

From a practical standpoint, these factors 
may prove useful as predictors of success or 
achievement in certain nursing specialties. 
The differences between supervisory and staff 
nurses in the present study on Factors A 
and C lend some support to this premise. 
However, these findings do not preclude the 
need for a thorough empirical investigation 
of the relationships between the factor scores 
and success or achievement within a particular 
nursing specialty. 
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STRONG VOCATIONAL INTEREST BLANK SCORES OF 
HIGH SCHOOL SENIORS AND THEIR LATER 
OCCUPATIONAL ENTRY II 


RALPH F, BERDIE 


Student Counseling Bureau, University of Minnesota 


In order to study the predictive validity of the SVIB, 130 university graduates 
were identified who had received degrees in dentistry, mechanical engineering, 
architecture, or journalism, and who had taken the SVIB while seniors in 
high school. Interest scores and patterns of the 4 groups were compared and 
comparisons made between each of these 4 groups and 3 groups studied 
earlier. Each of the groups of graduates tended to obtain as high school seniors 
SVIB scores related to their later occupation and the relationships were both 


statistically and practically significant, 


Strong Vocational Interest Blank (SVIB) 
scores differentiate among men in various oc- 
cupations (Strong, 1943) and predict the 
careers of students tested in college and 
followed up during the subsequent 18 years 
(Strong, 1955). Increasingly the SVIB is used 
for counseling high school seniors, but com- 
pared to the available data concerning adults 
and college students, the information about 
the predictive validity of the Blank for high 
school seniors is meager. This author in 1960 
reported the distributions of Strong scores 
for 123 persons tested as high school seniors 
who later obtained university degrees in medi- 
cine, law, or accounting. Distributions of 
selected scales and of patterns on 11 interest 
groups showed the three occupational groups 
differed significantly from one another on the 
basis of scores obtained as high school seniors. 
For these three small samples the SVIB 
possessed demonstrated predictive validity. 


The scores of the three groups were significantly 
different from one another, and pattern analysis of 
each student’s interest profile revealed that the 
three groups had different profile patternings as well 
as different scores on the individual scales. These 
differences suggest that careful use of the SVIB is 
justified with high school seniors [Berdie, 1960, p. 
165]. 


The present study adds to the initial one 
information -for an additional 130 students 
tested as high school seniors and who later 
graduated from one of four distinct university 
curricula: journalism, dentistry, mechanical 
engineering, and architecture. 
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The university graduation lists of 1957, 1958, and 
1959, were examined to identify the names of stu- 
dents obtaining these degrees. Then the records of 
the Minnesota State-Wide Testing Program were 
examined and from the lists of university graduates 
those persons were selected who had completed the 
SVIB when they were in the twelfth grade. This 
provided groups of 28 journalists, 40 dentists, 48 
mechanical engineers, and 14 architects, which sup- 
plemented the earlier groups of 39 physicians, 52 
lawyers, and 32 accountants. 

The question can be raised concerning the rela- 
tionship between the degree received by these uni- 
versity graduates and the actual occupation, A study 
by Vera Schletzer in 1963 followed up most of 
the persons included in this series of studies and 
information is available concerning the percentage 
of persons in six of the occupational groups who 
graduated from the curriculum and who 3 to 5 
years later were in the corresponding occupation. 
Of the 22 accounting graduates who responded, 91% 
reported they were working as accountants, Of the 
31 dental graduates, 97% reported they were prac- 
ticing dentists; one reported he was a medical stu- 
dent. All of the 38 graduates from mechanical engi- 
neering reported they were working as engineers. 
Of the 16 journalists, 76% reported they were 
working as journalists. Of the 38 law graduates, all 
reported they were working as lawyers, and of the 
29 medical graduates, all reported they were practic- 
ing physicians. 

Classification into occupation on the basis of cur- 
riculum from which the person graduated appears 
appropriate as an index of the occupation in which 
he will be found a few years later, at least for the 
curricula studied here. 

The profiles of each of these four new groups 
were coded (Darley, 1941) and the distributions 
of profile patterns and of scores on the selected 
scales were prepared. Comparisons were made be- 
tween the seven groups of graduates and tested 
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SVIB Scorrs AND LATER OccuPATION 
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with chi-square test to observe statistical significance 
of differences. 


RESULTS 


Table 1 presents the percentage of journal- 
ists, dentists, mechanical engineers, and archi- 
tects obtaining each of the various letter 
grades on each of 13 selected SVIB scales. 
Seven percent of the journalists and none of 
the persons in the other three groups ob- 
tained As on the Lawyer scale. Twenty-one 
percent of the journalists, 3% of the dentists, 
2% of the mechanical engineers, and none 
of the architects obtained B+s on the Lawyer 
scale. Fourteen percent of the journalists ob- 
tained As on the Author-Journalist scale and 
25% of the journalists obtained either As 
or B+s on this scale, as compared to 6% 
of the dentists, none of the mechanical engi- 
neers, and 14% of the architects. The table 
reveals that more of the journalists than of 
persons in the other three groups obtain As 
on the Author-Journalist scale, more of the 
dentists obtained As on the Dentist scale 
than did persons in the other three groups, 
many more of the engineers than of persons 
in the other groups obtained As on the Engi- 
neer scale, and many more of the architects 
obtained As on the Architect scale than did 
persons in the other three groups. 

Table 2 presents the percentage in each 
group obtaining primary, secondary, tertiary, 
or no pattern in each of the 11 occupational 
groups. For example, none of the journalists 
had a primary pattern in Group I, contain- 
ing occupations related to the biological 
sciences and art, 13% of the dentists had 
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primary patterns here, 6% of the mechanical 
engineers, and 21% of the architects. The 
occupations of dentistry and architecture are 
included in Group I and these are the two 
samples here obtaining most primary pat- 
terns. None of the journalists had primary 
patterns in either of the scientific groups and 
only 7% had secondary patterns in Group I 
and 4% had secondary patterns in Group II. 
Of the mechanical engineers, 6% had primary 
patterns in Group I, 17% in Group II, 21% 
had primary or secondary patterns in Group 
I, and 34% had primary or secondary pat- 
terns in Group II. More journalists than 
persons in any of the other three groups had 
primary patterns in the verbal language group 
(Group X). In general, the journalists are 
characterized by patterns in the business, 
musician, and verbal language groups, the 
dentists by patterns in the technology and 
production manager groups, the engineers by 
patterns in the same two groups, and the 
architects by patterns in the production man- 
ager, technology, musician, president of 
manufacturing concern and the two science 
groups. 

Although the four groups studied here are 
differentiated from one another, the distribu- 
tion of scores, and presumably of patterns, 
are not identical with the distributions Strong 
found for the criterion groups. In each of 
Strong’s occupational criterion groups 70% 
had scores of A on the appropriate scale, 12% 
scores of B+, 10% scores of B, 4% scores 
of B—, 3% scores of C+, and 1% scores 
of C. Thus, of Strong’s 241 architects in- 


TABLE 2 


PERCENTAGE OF 28 JOURNALISTS, 40 Dentists, 48 MECHANICAL ENGINEERS, AND 14 ARCHITECTS 
TESTED IN GRADE 12 WHO RECEIVED VARIOUS TYPES OF INTEREST PATTERNS ON THE 
STRONG VOCATIONAL INTEREST BLANK 











Type of pattern: Primary Secondary Tertiary None 
Occupational group: Tv. MLE. WAG J. (D.UM.E.oA; J. D. MLE. A. J. DIME A 
I Biological science 0 13 6 21 720 Seed Sie 1S eso Le 75 SO 46 43 
II Physical science OSLO. a7 24 4 0 Fr 14 AD 25 | 23 29 93 68 44 36 
III Production manager (97332 50NsO 4 30 23 43 255020-.ol Sie LA 64 18 6 7 
IV Technological 25 38 54 29 eed See sae 2d ig S08) 5 oO. 57 18 6 14 
V Social science 145 10.92 7 29:5 8 0 i 40; 40) 29 50 85 79 64 
VI Musician $0) 915 8 43 1423 B20 eed A 20! 10) aes 25. 43) sG0e e241 
VII Certified public accountant i 2 7 y aes 2 0 21s 6 29 68 85 90 64 
VIII Business detail 11 20 8 if 113 2S Oe 14: SZ OS mecieensO 39: 25.46, 9945 
IX Business contact 36: 15 6 0 11 10 6 14 21S LSet 32, 53. omom 
X Verbal language 32 ae) 0 0 Svs Gr Zi 14 20 15 29 36. 73, 79a 
XI President of manufacturing concern i220 get aeee2n 13220 915) 29 18: 20 (29° 29: 57 40 44 21 





Note.—J. = Journalist; D. = Dentist; M.E. = Mechanical Engineer; A. = Architect. 
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TABLE 3 


PERCENTAGE IN Eacu oF 7 OccuUPATIONAL Groups TESTED IN GRADE 12 AND oF 6,481 HicH 
ScHooL SENIORS WHO Hap A or B+ on 13 SELECTED STRONG VOCATIONAL 
INTEREST BLANK SCALES 














Mechan- 

ical High 

Occupational group: Account- Journal- Den-  Engi- Archi- School 

SVIB Scale Medicine Law ing ism tistry neering tecture Senior 
Lawyer 18 50 18 28 3 2 0 5 
Physician 49 2 6 11 30 Da 35 12 
Accountant 5 19 43 7 18 6 14 9 
Farmer 48 8 32 32 70 84 57 78 
Aviator 26 16 29 28 55 85 of 65 
Engineer 33 0 15 4 30 73 65 18 
Osteopath 59 6 9 8 36 23 21 21 
Personnel Director 20 21 9 22 3 8 7 3 
Public Administrator 26 21 28 32 3 16 7 6 
Real Estate Salesman 21 75 60 61 30 15 36 47 
Dentist 41 0 9 4 43 28 7 9 
Author-Journalist 5 12 0 25 5 0. 14 10 
Architect 4 2 3 7 13 19 Si if 
N 39 52 32 28 40 48 14 6,481 





cluded in the norm group, 70% had As when 
they were tested as adult and practicing archi- 
tects as compared to only 36% of the gradu- 
‘ate architects included here who were tested 
as high school seniors. 

The scale most consistent with Strong’s 
results is the Engineering scale which pro- 
vided scores of A for 67% of the mechanical 
engineers tested in the twelfth grade. The 
results suggest that the journalists, when they 
were tested as high school seniors, more 
resembled in terms of their interests adult 
real estate salesmen, personnel directors, and 
farmers than they did practicing journalists. 
Similarly, the prospective dentists more re- 


sembled farmers than they did dentists; the 
prospective engineers more resembled avia- 
tors; and the prospective architects more 
resembled aviators. 

In part, these results are related to the 
distribution of scores found for high school 
seniors in general. A group of 6,481 male 
high school seniors in Minnesota tested in 
1955-1956 had a concentration of high scores 
on the Farmer and Aviator scales with 62% 
obtaining As on the Farmer scale and 48% 
obtaining As on the Aviator scale (Berdie, 
Layton, Swanson, Hagenah, & Merwin, 
1962). Of all of these high school seniors who 
were tested, 1% had As on the Lawyer scale, 


TABLE 4 


PERCENTAGE IN Eacu oF 7 OccuPATIONAL Groups TESTED IN GRADE 12 WuHo Hap PRIMARY 
OR SECONDARY INTEREST PATTERNS ON THE STRONG VOCATIONAL INTEREST BLANK 











Mechani- 
ca 

Occupational group: Account- Journal- Den- Engi- Archi- 

Interest group Medicine Law ing ism tistry neering tecture 
I Biological science 42 2 0 7 33 21 42 
II Physical science 33 2 9 4 10 34 35 
III Production manager 33 16 38 11 63 79 79 
IV Technological 23 14 35 36 53 69 50 
V Social science 18 29 19 43 ay 10 7 
VI Musician 36 40 25 64 38 29 64 
VII Certified public accountant 25 33 47 11 8 4 7 
VIII Business detail 15 58 78 29 43 27 21 
IX Business contact 20 89 59 47 25 12 14 
X Verbal-language 20 52 22 50 8 6 21 
XI President of manufacturing concern 39 34 35 25 40 28 50 
N 39 52 32 28 40 48 14 





192 RatpHo F. BEeRDIE 


3% on the Author-Journalist scale, 8% on 
the Architect scale, 5% on the Physician 
scale, 2% on the Dentist scale, 7% on the 
Engineer scale, and 4% on the Accountant 
scale. Thus the students in the four groups 
studied obtained proportionately many more 
high scores on the relevant scales than did 
high school seniors in general included in the 
comprehensive sample. 

Tables 3 and 4 assemble data for the seven 
occupational groups for which we have infor- 
mation, the four reported here and the three 
reported in the earlier publication. Table 3 
shows the proportion in each of the seven 
groups and in the total group of high school 
seniors obtaining scores of A or B+ on 13 
selected SVIB scales. Table 4 shows the 
proportion in each of the seven occupational 
groups obtaining either primary or secondary 
interest patterns on each of the 11 SVIB 
groups. 

To summarize these tables, the physicians 
had more As or B+s on the Physician scale 
than did persons in any of the other groups, 
the lawyers had more As or B+s on the 
Lawyer scale than did any other group, al- 
though they had even more As or B+s on 
the Real Estate Salesman scale, the account- 
ants had more As or B+s on the Accountant 
scale than did any of the other groups, the 
author-journalists had more As or B+s on 
their scale than did any of the other groups, 
although the physicians approached them, 
the engineers had more As or B+s on their 
scale than did any of the other groups, and 
the architects had more As or B+s on their 
scale than did any of the other groups. 

The same kind of statements can be made 
concerning the data in Table 4. The phy- 
sicians, dentists, and architects had more 
primary patterns in Group I than did the 
groups whose occupations were not included 
in Group I. The engineers had more primary 
or secondary patterns in Group IT than did 
the lawyers, accountants, or journalists, al- 
though they were about the same as the 
physicians and architects on this group. The 
accountants had more primary or secondary 
patterns than did the other groups on both 
the Certified Public Accountant scale and 
Group VIII, the business detail group, and 
the lawyers and journalists had more primary 


or secondary patterns on Group X, the verbal 
language group, than did the other groups 
whose occupations were not included in 
Group X. 

For each group of graduates the number 
of persons obtaining A or B+ and B or less 
on the scales for lawyer, physician, account- 
ant, dentist, architect, engineer, and author- 
journalist, was compared with similar num- 
bers in each of the other groups. Comparisons 
also were made between the groups for each 
of the 11 interest patterns using the number 
of persons obtaining primary or secondary 
patterns and the number obtaining tertiary 
or no patterns. For each pair of groups each 
comparison was tested for statistical signifi- 
cance using a chi-square test. Below are listed 
the scales for each of the compared groups 
where the differences were significant at a 
probability level of .05 or less. 


Scales on which they 


Groups compared 
differed 





Lawyer, Physician, 
Engineer, Author- - 
Journalist 


Physician and Engineer 


Physician and Dentist 
Physician and Architect 
Physician and Journalist 


Lawyer and Engineer 


Lawyer and Dentist 


Lawyer and Architect 


Lawyer and Journalist 


Accountant and Engineer 


Accountant and Dentist 


Accountant and Architect 


Accountant and Journalist 


Engineer and Dentist 
Engineer and Architect 
Engineer and Journalist 


Dentist and Architect 
Dentist and Journalist 


Architect and Journalist 


Lawyer 

Dentist, Architect 

Physician, Dentist, 
Engineer 

Lawyer, Physician, 
Dentist, Architect, 
Engineer 

Lawyer, Physician, 
Dentist, Architect, 
Engineer ; 

Lawyer, Physician, 
Architect, Engineer 

Author-Journalist 

Lawyer, Accountant, 
Engineer 

Physician, Accountant, 
Dentist 

Physician, Engineer, 
Architect 

Accountant, Author- 
Journalist 

Engineer 

Architect 

Lawyer, Dentist, 
Engineer, Author- 
Journalist 

Dentist, Architect 

Lawyer, Dentist, 
Engineer, Author- 
Journalist 

Architect, Engineer 


SVIB Scores AND LATER OCCUPATION 


The significance of the differences found 
when the interest patterns were compared 
were similar. 

These summaries of the statistical signifi- 
cance of comparisons indicated that fre- 
quently when two occupational groups were 
compared on their two corresponding scales, 
they differed significantly. When the two 
occupations being compared belonged to dif- 
ferent SVIB factor groupings, the differences 
usually were significant, 28 of 34 such com- 
parisons being significant. Thus, the phy- 
sicians and engineers were different on both 
the Physician and Engineer scale; the phy- 
sicians and journalists were different on the 
Physician scale but not on the Journalist 
scale. Eight such comparisons were made of 
scores on their own scales between groups 
belonging to the same SVIB factor grouping 
and three of these were statistically signifi- 
cantly different. In some cases, two occupa- 
tions falling within the same factor group on 
the SVIB did not differ on either of their 
scales but differed on another scale. For ex- 
ample, the physicians and dentists, both 
falling within Group I, did not differ sig- 
nificantly on either the Physician or Dentist 
scale but did differ on the Lawyer scale. In 
the comparison of the various groups on the 
basis of patterns, a significant difference was 
found on at least one pattern for every pair 
of groups except one. The dentists and the 
architects did not vary significantly in the 
proportion of persons having primary or 
secondary patterns in any of the 11 SVIB 
groups. On the other hand, when the lawyers 
and engineers were compared, they varied 
significantly on 9 of the 11 groups, the excep- 
tions being 2 single scale groups, musician 
and president of manufacturing concern, 
Using frequency of patterns, 231 comparisons 
were made and of these, 44 were comparisons 
between groups of graduates falling within 
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the same SVIB groupings. Of the 44 com- 
parisons, 8, or 18% were statistically sig- 
nificant with a probability level of at least 
.05. Of the 187 comparisons of groups not 
within the same SVIB group, 85, or 45%, 
were significant. 

The results of the analysis of the four 
groups of graduates reported here agree with 
those reported in 1960 for three other groups. 
High school senior boys have SVIB scores 
related to their entry occupation after college 
graduation. Although the relationship between 
twelfth grade scores and postcollege occupa- 
tion is statistically and practically significant, 
some men enter and presumably remain in 
occupations in which their SVIB scores were 
low. Most men, however, were in occupations 
corresponding to those SVIB scales on which 
scores of above B— had been earned. The 
results again justify the careful use of the 
SVIB with high school seniors. 
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PRELIMINARY EXPERIMENTS ON KEYBOARD DESIGN 
FOR SEMIAUTOMATIC MAIL SORTING* 


HUGH M. BOWEN anp G. 


VICTOR GUINNESS 


Dunlap and Associates, Incorporated, Darien, Connecticut 


Mail sorting by keyboards will require operatives either to remember a digita- 
tion for each address (“memory encoding”) or key certain selected characters 
from the address (‘extraction encoding”). Keyboards may be chord (multiple 
depressions per stroke) or sequential (single keys in succession). 3 preliminary 
experiments indicated (a) in sequential keying a key stroke takes approximately 
0.3 secs., hence keyboards with many keys and requiring few key strokes per 
encoding are preferable, (b) for memory encoding a chord keyboard with many 
keys (24) is superior to a smaller chord keyboard and a sequential keyboard, 
and (c) for extraction encoding a typewriter and a 24-key chord keyboard are 
equal in performance. Training and practice requirements associated with dif- 
ferent keyboards are suggested for future research. 


Some proportion of future mail sorting will 
be conducted by operators seated at consoles. 
Mail will be fed in front of the operators who 
will encode each piece by suitable digitations 
into a keyboard. One of the human factors 
issues is the optimal design of the keyboards, 
and this paper presents some preliminary 
experiments toward their definition. 

Operators may be required to encode mail 
by one of two basic methods. The “memory 
encoding” method requires the operator to 
learn a list of paired associates; for each 
address (actually a category of addresses) he 
must learn an arbitrarily assigned digitation 
pattern. The “extraction encoding” method 
requires the operator to learn rules by which 
certain of the alphanumeric characters in each 
address are extracted to form the symbol 
series which is digitated into the keyboard. In 
practice, no single extraction rule can apply 
to all addresses, and extraction codes have ex- 
ceptions which the operator must memorize. 
Figure 1 characterizes the two types of 
encoding. 

It may be noticed that numeric addresses 
(i.e., ZIP Codes) are, as far as the keyboard- 
design problem is concerned, a variation of 
extraction encoding using only numeric 
characters. 


1Work conducted for the United States Post 
Office Department under Contract NC 40-60. The 
Post Office Department does not necessarily endorse 
any of the contents of this article; all responsibility 
for the contents is the authors’. 


Keying may be categorized as either se- 
quential or chord. On a sequential keyboard 
an encoding may be digitated into a keyboard 
by successive single-key strokes; the conven- 
tional typewriter is an example. On a chord 
keyboard an encoding may be digitated into a 
keyboard by one or more strokes where each 
stroke is composed of a number of key depres- 
sions; the stenotype keyboard is an example. 

A family of variations exists with respect 
to the meaning that is given to each key and 
to each stroke of n keys (where n is 1, 2, or 
more, depending upon how many fingers are 
used per stroke and the configuration of the 
keyboard). For example, for memory encod- 
ing and when 100 separations are required, a 
chord keyboard of 10 keys would require a 
maximum of 2 keys per stroke per encoding, 
while a keyboard of 5 keys would require a 
maximum of 3 keys per stroke per encoding. 

Our purpose was to determine the generic 
properties of keyboards which rendered them 
most compatible with manual motor ability, 
taking into consideration the amount of in- 
formation to be transmitted per encoding and 
the type of encoding used. We conceived an 
interface to exist between the proximal 
stimulus existing immediately prior to keying 
and the actual act of keying. We considered 
the proximal stimulus for a memory encoding 
to be the memorized number or, and more 
accurately in the case of a practiced operator, 
the memorized digitation for some address. 
We considered the proximal stimulus for an 
extraction encoding as the sequence of ex- 
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Address 
STAMFORD 
Operator 
Encodes 
"Memory" OR "Extraction!! 
e.g., from a New York e.g., Onan extraction rule of 
Post Office Stamford is lst, 3rd, 5th, and last charac- 
Separation ter, STAMFORD is keyed 
#128 — SeAvED) 
Fic. 1. Two types of encoding. 
tracted characters or, and more accurately in Another distinction between “large” and 


the case of a practiced operator, the memor- 
ized manual correlates of the characters. 

The principal endeavor of the study was to 
attempt to find the keyboard designs which 
would optimize the flow of information and 
energy across the Stimulus-Response inter- 
face (i.e., the man-keyboard interface). For 
this purpose we studied various possible mix- 
tures of task demand and keyboard design. 

The preliminary experiments herein de- 
scribed consider only relatively simple key- 
board designs. We chose uncomplicated rep- 
resentatives of the two basic keyboard cate- 
gories, chord and sequential, and tested them 
for simplified versions of the two basic en- 
coding methods. The primary experimental 
issue was the compatibility between type of 
keyboard and type of encoding. 

A second issue was the manner of manip- 
ulation on a keyboard as this is affected by 
the number of keys on the keyboard. A chord 
keyboard with few keys demands, on the 
average, many keys per stroke; while a chord 
keyboard with many keys demands few keys 
per stroke. For sequential keyboards there is 
a similar distinction. For both types of key- 
boards there is a trade-off between the total 
number of keys on the keyboard and the 
number of keys required to be struck in order 
to input any given amount of information. 


“small” keyboards is that the smaller key- 
boards require fewer lateral motions of the 
hands and fingers than the larger board; we 
have termed this aspect of keyboard activa- 
tion “manual ranging.” However, small key- 
boards, especially chord keyboards, have a 
disadvantage in that they entail striking keys 
with difficult finger combinations. Recent 
studies (Seibel, 1962, 1963; Ratz & Ritchie, 
1961) have indicated that finger combinations 
can be ranked in terms of speed of response; 
single fingers tend to be fastest, combinations 
of nonadjacent fingers tend to be slowest. 


EXPERIMENT I, SEQUENTIAL KEYBOARDS 


The typewriter keyboard is an obvious first 
choice for practical situations because of the 
widespread familiarity with its use. The ad- 
vantage would be emphasized in the case of 
extraction encoding which requires the digita- 
tion of a series of alphanumeric characters. 
It can also be used for memory encoding by 
assigning, according to some system, signifi- 
cances to sequences of key depressions. 

The first experiment was designed to study 
whether a likely variation of the sequential 
keying method would produce any better per- 
formance than keying a typewriter in the 
conventional fashion. 

Table 1 illustrates the experiment and the 
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TABLE 1 


EXPERIMENT I. 


SEQUENTIAL KEYBOARDS 


COMPARISON BETWEEN A 29-Kry AND AN 8-Kry KEYBOARD: EXPERIMENTAL CONDITIONS AND RESULTS 














Number Number of Average Average 
Number of of “bits” encodings time to errors 
Type of keys struck per en- per sub- key one (percent- 
Keyboard per encoding coding  Subjects® ject encoding age) 
29 key 2 9.75 3 180 Oe 39 
(Alphabetic) Successive practice Seconds 
8 key 3 9.0 3 180 87 18 
Successive test Seconds 





a Same three experienced typists. 


results. Three experienced typists typed lists 
of alphanumeric characters under two con- 
ditions. In the first condition, the keyboard 
was composed of 29 keys (26 letter keys and 
3 punctuation keys), any one of which could 
be struck; each encoding consisted of 2 keys 
struck in sequence. In the second condition, 
only 8 keys were used (the “home” row on 
the typewriter), and 3 successive keys had to 
be struck for each encoding. The amount of 
information transmitted per encoding for the 
two keying conditions was nearly identical, 
being slightly less for the 8-key condition. 

The results (Table 1) indicate that a larger 
keyboard allows operators to encode more 
quickly (¢= 25.4, p< .01) than a smaller 
board when about nine bits of information 
are transmitted per encoding (a capability 
for separating to approximately 500 destina- 
tions). Although the error scores favor the 
larger board, the difference was insufficient 
to meet statistical significance. 


e ee @ © oo .°8 e e @ e 
o 8 ° o @ @ @ © ® @ 
ee e e e e ° ° ° 


TYPEWRITER 


e ° 
CHORD = ° ° e 
(small) S ° e . 

e e 
es ° e 
e e . e . 

CHORD S elite ob aval eae 
(large) eo 6 


Fic. 2. The three experimental keyboards. 


The inference can be drawn from the data 
that, over the range considered, it is the num- 
ber of keys that is struck which determines 
performance. For the larger keyboard, aver- 
age time to strike a key was .275 seconds; 
for the smaller keyboard, .29 seconds. 

Hence it appears that, in sequential keying, 
speed is closely related to the number of keys 
struck. In order to decrease the number of 
keys struck, it is necessary to increase the 
number of keys. Hence, keyboard designs util- 
izing many keys, such as the conventional 
typewriter, are to be preferred. 

While this experiment was on a small scale, 
the results appear to be quite convincing 
and the typewriter keyboard was carried for- 
ward to the succeeding experiments as the 
exemplar of sequential keying. 


EXPERIMENT II. “Memory ENCODING” ~ 


This experiment studied keyboard per- 
formance with respect to a memory task. 
Three keyboards were used (Figure 2). The 
stimulus material consisted of readouts show- 
ing a three-digit number. For each keyboard 
the subjects were required to learn 11 digita- 
tion patterns conforming to 11 display num- 
bers. Each encoding was as follows: for the 
typewriter, two keys depressed sequentially; 
for the small chord keyboard, 1, 2, 3, or 4 
keys depressed simultaneously; for the large 
chord keyboard, 1, 2, or 3 keys depressed 
simultaneously. The digitation patterns were 
selected so as to represent the fingering diffi- 
culty and frequency that would be used for a 
500-separation mail sorting scheme. The 
keying tasks went ahead on a self-paced basis. 
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TABLE 2 
EXPERIMENT IT, Comparison or 3 KEyBoARDS FOR MremMory ENCODING 
EXPERIMENTAL CONDITIONS AND RESULTS 
Number Number Number 
“bits” Number of correctly incorrectly 
Type of per encodings sorted per sorted per 
keyboard Digitation encoding Subjects per subject minute minute 
Sequential 2 keys 9 6 40.4 4.2 
(typewriter) successively (Gi—=36,1)) (ca=—="3a7}) 
Chord 1-4 keys 9 6 20 hours 55as 7.8 
(small) simultane- training (¢ = 9.7) (o = 4.6) 
ously 
800 exp. 
Chord 1-3 keys 9 6 encodings 49.0 So 
(large) simultane- (co = 9.4) (Gi—2e5) 
ously 
matched 
groups 








Table 2 summarizes the experimental plan 
and the results. A matched-group design was 
used; the matching was done by a battery 
of tests, including sample-task tests. It is seen 
that the chord boards have an advantage in 
speed terms over the sequential board. By 
analysis of variance, the F Ratio for type of 
keyboard was calculated to be 3.69 which 
for df’s of 2 and 15 gives a p= .05; ¢ tests 
for the individual keyboards gave chord 
(small) > chord (large) (¢ = 2.56, p < .05) 
> sequential (¢ = 3.47, p < .01). In terms of 
accuracy, the large chord board produced the 
best performance; by analysis of variance, the 
F Ratio for type of keyboard was calculated 


to be 21.51 which for df’s of 2 and 15 gives 
a p< .001; ¢ tests for the individual key- 
boards gave chord (large)> sequential (¢ = 
2.81, p < .02)> chord (small) (¢ = 9.70, p 
< .001). 

It is further evident that the large chord 
keyboard is preferable, for, while it is slower 
than the small chord keyboard by 6.3 encod- 
ings per minute, it is more accurate by 4.5 
encoding per minute. The extra speed of the 
small chord keyboard is bought at a heavy 
cost in errors. As discussed above, large chord 
keyboards do not require such difficult finger 
patterns as the small chord keyboards, which 
may account for the difference in errors. 


TABLE 3 


EXPERIMENT III. 


CoMPARISON OF 2 KEYBOARDS FOR EXTRACTION ENCODING 


EXPERIMENTAL CONDITIONS AND RESULTS 











Digitation Number of Number Number 
for each of “bits” correctly incorrectly 
Type of two extracted per en- Encodings sorted per sorted per 
keyboard characters coding Subjects per subject minute minute 
Sequential 25 hours eel. 1.5 
(typewriter) 1 key 10+ 6 training (o = 4.5) (o = .8) 
Chord 1 or 2 keys 10+ 6 32 minutes 30.3 1.6 
(large) exp. (o = 3.6) (eo 1) 
testing 
matched 


groups 
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EXPERIMENT III. “EXTRACTION ENCODING” 


This experiment studied keyboard per- 
formance with respect to an extraction task. 
The typewriter keyboard and the large chord 
keyboard were used in this experiment. A 
simple form of extraction was used in which 
the subjects looked at the first and last digits 
of a three-digit number and keyed each of the 
digits according to certain digitation patterns 
they had previously learned. For the type- 
writer keyboard, two keys were struck in 
succession; for the chord keyboard, two 
chords consisting of either one or two keys 
were struck in succession. The digitation pat- 
terns represented the range of digitation which 
would be required for the implementation of 
a 36-character alphanumeric extraction cap- 
ability. The keying tasks went ahead on a 
self-paced basis. Table 3 summarizes the 
experimental plan and the results. It is evi- 
dent that there were no differences with 
respect to the speed and accuracy of sorting 
between the keyboards; the evident lack of 
difference was confirmed by statistical analysis 
which yielded ¢ values of less than unity for 
the differences of the means for both the speed 
and accuracy scores. 


DISCUSSION 


These small-scale and preliminary experi- 
ments have indicated that for memory encod- 
ing, a chord keyboard is preferable, and that 
for extraction encoding there is apparently no 
difference between chord and sequential key- 
boards. 

The experimenters were not surprised to 
find that chord keyboards should prove 
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superior for memory codes. Personal experi- 

ence indicated that when using a memory 

code, and after sufficient practice, one looks 

at an address and pairs it immediately with a 

response. The response is a unit, not spread 

out in time. It seems, therefore, most com- 

patible to engineer the digitation also as a 

unit not spread out in time. 

Perhaps the most striking result of key- 
board studies such as those reported here and 
those, for instance, of Deininger (1960) is 
that performance differentials tend to be small 
between different keyboards once some skill 
in performance is achieved and provided that 
the keyboards are designed rationally and in 
keeping with good human engineering prac- 
tice. It is now thought that concentration on 
differentials accompanying proficient perform- 
ance provides too narrow a data base and that 
more significant differentials may lie in the 
training and practice requirements area. This 
hypothesis is important to investigate for ap- 
plications where the personnel turnover rate is 
high and where training is a major system 
requirement. 
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A NOTE ON THE EVALUATION OF A NEW 
ANSWER FORM 


IRWIN MILLER 


International Business Machines Corporation, Endicott, New York 


An answer form with a new item format proved comparable to a standard 
IBM form in an answer-marking task. A new red answer form with similar 
new item format was then used with actual tests. Ss were 4th graders, 8th 
graders, and 12th graders in 4 cities; each took a test form appropriate for 
his grade level with a new and old answer form. The new answer form was 
considered acceptable for use in the 8th grade and higher grades. Color change 
represented by the new red answer form had no significant effect on test 
performance of persons in the 8th grade or higher. Statistical analyses of the 


4th-grade data proved inconclusive. 


In an earlier study that compared students’ 
answer-marking performance on eight poten- 
tial answer-form designs (Miller & Minor, 
1963), it was found that total right answers 
on an experimental form having answer items 
arranged as shown in Figure 1A did not differ 
significantly from total right on an answer 
form standardly utilized with the IBM 805 
test scoring machine, which had items like 
the one shown in Figure 1B. 

A second research study, reported here, had 
two purposes. To validate the initial findings 
it was essential that an actual test be adminis- 
tered with answer forms of both the conven- 
tional and the experimental type, since sub- 
stantive test content had been intentionally 
avoided in the first study. Another purpose of 
the second study was to examine the influence 
of the color in which the answer forms were 
printed. In the first study all of the answer 
forms were printed in red ink, rather than the 
conventional blue. For the current study it 
was planned to look into the effect of color 
as well as design by utilizing both a red and 
a blue version of the conventional IBM an- 
swer form. 

The Lorge-Thorndike Intelligence Tests, 
Verbal Battery, were selected as a vehicle for 
studying the answer forms. This test series 
satisfied requirements for availability at levels 
suitable for fourth, eighth, and twelfth grade 
pupils (Levels 3, 4, 5) in two equivalent forms 
at each level having high alternate-form reli- 
ability coefficients (Forms A and B). 

The answer forms usually administered with 
the Lorge-Thorndike, referred to here as the 


LT Blue answer forms, contain items like 
those in Figure 1B, but with alphabetic letters 
ranging from A through V (over four succes- 
sive items) rather than the digits 1, 2, 3, 4, 5 
repeated in each item (see Figure 1C). A pre- 
cise isomorphism exists between the specific 
LT Blue answer form intended for each level 
and the test-form booklet for that level, not 
only in the alphabetic lettering of the answer 
options, but also in the number of test columns 
(one for each subtest in the booklet), exam- 
ple items, and answer items within test col- 
umns. A special printing was made in red ink 
from the regular Lorge-Thorndike answer- 
form plates to secure LT Red answer forms 
for Levels 3, 4, 5 as well as the regular blue 
forms. 

A single New Red answer form intended for 
use with any of the three test levels was de- 
signed, containing answer items similar to 
those of the original experimental form, Figure ». 
1A, but with the options designated with let- 
ters (see Figure 1D). On this answer form five 
horizontal bands of answer items were pre- 
sented, separated by heavy lines and labeled 
Test: L57s"Lests2720c° ines “Lest.” These. five 
bands were essential to accommodate the five 
subtests contained in Level 4 and in Level 5, 
but were more than needed for Level 3, which 
consists of four subtests. Each band of an- 
swer items on the New Red answer form was 
designed to contain at least as many answer 
items and example items as the maximum en- 
countered at any of the three test levels in the 
correspondingly designed subtest. Hence, in 
view of this inclusive generality, the New Red 


199 


200 


ieee 9 2o52 


TRWIN MILLER 


(D) Sample of New Red Answer Form 


Fic. 1. Answer items. 


form can be described as polymorphic rather 
than isomorphic. 


MeEtTHOD 


School systems in four different cities took part in 
the study. A total of 226 fourth graders, 255 eighth 
graders, and 198 twelfth graders were tested. Each 
subject (S) took one test form appropriate for his 
grade with the polymorphic New Red answer form. 
In addition, half of the Ss took the alternate test 
form for their level with the regular LT Blue answer 
form while the remaining Ss took the alternate test 
form with the LT Red answer form. 


RESULTS AND DISCUSSION 


Analyses of variance were made for each 
grade sample, based on the difference in raw 
score achieved by each S on his alternate test 
forms. The difference score analysis tested the 


effects of answer form presentation order, test 
form presentation order, sex, city, and all pos- 
sible interactions among these variables. 

In the fourth-grade data, significant differ- 
ences were found among answer-form orders, 
and also among cities. However no conclusion 
regarding the answer forms could be drawn 
due to a significant first-order interaction 
(Answer Form Order X Test Form Order) 
and a significant second-order interaction 
(Answer Form Order X Test Form Order X 
City). In the eighth-grade data the answer- 
form orders did not. differ significantly and 
no interactions involving answer forms proved 
significant. In the twelfth-grade data, neither 
the answer-form orders nor any other sta- 
tistical test in the analysis of variance proved 
significant. 


INTELLIGENCE TEST PERFORMANCE 


On a basis of the statistical analyses, the 
New Red answer form could be regarded as 


equivalent to the conventional LT Blue an-. 


swer forms in the eighth-grade and twelfth- 
grade samples. Furthermore, since the an- 
swer-form orders containing LT Blue forms 
produced results no different than the orders 
containing LT Red forms, the color change 
evidently had no effect on test performance in 
the upper grades. 

In the fourth-grade sample, a subsidiary 
analysis based on the Scheffé (1953) pro- 
cedure for making individual tests of means 
showed that two cities produced no significant 
differences among answer-form orders while 
the remaining two cities did show differences. 
Factors contributing to these inconsistent re- 
sults in the fourth grade might include large 
differences in ability among the classes, as in- 
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dicated in the significant differences among 
cities as a main effect, and/or differences in 
the amount of previous exposure to machine- 
scored tests calling for manipulation of a sepa- 
rate answer form. 

A more detailed description of this evalua- 
tion is found in Miller (1964). 
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TRAINING INDUSTRIAL EXECUTIVES IN READING: 
A METHODOLOGY STUDY 


DAN H. JONES 
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A controlled-method study of gains achieved and retained by industrial ex- 
ecutives as a result of reading improvement training. 4 groups consisting of 
56 executives were equated on reading-ability score and related criteria. 1 group 
served as a control, while 3 experimental groups were trained with different 
methods. Each group received 16 hours of training. Progress and permanence 
were evaluated by equated forms of a reading test. Results were analyzed by 
t tests between and within groups. No significant differences were found be- 
tween methods. Very significant progress and retention was found within all 
groups. Industrial executives can be trained to read more efficiently, and do 
retain that efficiency. Mechanical aids are not required for reading training of 


executives. 


This study was concerned with the evalua- 
tion of methods of training industrial super- 
visors in reading improvement. Its major pur- 
pose was to determine the relative effective- 
ness of certain techniques used in training 
programs in reading. Specifically, the study 
attempted to answer the following questions: 

1. Can industrial executives be trained to 
read more effectively? 

2. If reading efficiency is improved, to what 
extent is improvement retained? 

3. Which of three commonly used methods 
is most beneficial? 

All criteria scores used in this study are de- 
rived from the Diagnostic Reading Tests, Sur- 
vey Section (Triggs, 1952). Form A was used 
prior to training, Form B immediately after 
training, and Form D was used in follow-up 
testing. In this study, reading rate is Score la, 
reading comprehension is Score 1b expressed 
in percentage, and reading index is derived 
from the product of the two scores. Additional 
measurement in the program included the 
Thurstone Mental Alertness Test. 


MetTHOoD 
Sample 


The candidates for the course were all classified 
aS management personnel in the corporation. They 
were all male and their age range was 29 to 63 years 
with a mean age of 41.8 and a standard deviation of 
7.2. Their initial mean reading rate was 264 words 
per minute, the mean comprehension level was 68.6%, 
and the mean reading index was 184. The length of 
service with this company ranged from 5 to 33 years. 
The educational level ranged from completion of 


grade nine through the PhD degree. Participation in 
the course was voluntary, and no pressure was ap- 
plied from any source. 

A reading-interest survey administered prior to 
training indicated that on the average these men read 
12.5 hours per week on their job, and that the ma- 
jority of this reading consisted of letters, memos, 
technical magazine and journal articles, graphs and 
statistical data, and technical reports. Their off-the- 
job reading was estimated to be 11 hours a week in 
duration, and consisted primarily of newspaper read- 
ing and short fiction and nonfiction stories. The four 
magazines most frequently read by these people were 
listed as Readers Digest, The Saturday Evening Post, 
Time, and Life. 

The 56 volunteers were organized into four groups. 
These groups were equated on the following criteria: 
reading rate, reading comprehension, reading index, 
mental alertness linguistic score, mental alertness to- 
tal score, age, and vocabulary level. 


Procedure 


The general objective of the training program was 
to improve the reading efficiency of industrial execu- 
tives, so that they might be able to take advantage 
of the newly gained skills both in their position with 
the corporation and in their leisure-time reading. 

Basically, the philosophy underlying this program 
placed the emphasis on reading as a dynamic, active 
thought process, This process occurs primarily as the 
reader fuses words into ideas. From the mechanical 
point of view, cognizance was taken of the role of 
eye movements in reading. However, more emphasis 
was placed on the results of more efficient eye move- 
ments than on the eye movements themselves. The 
value of establishing purpose in reading was stressed 
throughout the program. This was accomplished by 
discussions of critical reading, and defining the exact 
purpose for which specific materials are read, in ad- 
vance of actual reading. Suggestions for the improve- 
ment of comprehension centered around the approach 
of the reader to the material, analysis of style and 
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organization of writing, as well as recognition of key 
points, main ideas, details, and differentiation be- 
tween opinion and fact. With purposeful reading was 
stressed the necessary adjunct of flexibility of read- 
ing speed to fit the specific situation. 

The pattern of training, without reference, to indi- 
vidual method, is as follows: 

Phase 1—Testing. This phase of the program was 
incorporated into the design in order to establish a 
basis from which to begin training. A complete as- 
sessment of various reading factors and mental alert- 
ness was undertaken. In addition, a visual screening 
test was included. 

Phase 2—Interviewing. This phase was designed as 
a means of privately reporting and interpreting read- 
ing-ability test scores, examining specific reading 
problems, and giving information regarding the pro- 
gram to each participant. Reading interests here were 
crystallized, motivation level was assessed, and an in- 
formal, personal relationship was established between 
the instructor and the participant. 

Phase 3—Training. The training phase was de- 
signed to consist of eight 2-hour sessions held once 
weekly. Although it was felt that sessions might be 
held more often with profit, the operating schedules 
of the participants prohibited meeting more often 
during the week. The sessions consisted of informa- 
tion regarding principles of efficient reading, and su- 
pervised practice with reading manuals composed of 
specifically designed, industrially oriented materials. 
Supervised practice became a major portion of the 
program in the belief that one of the main keys to 
success in this type of training was the discipline 
provided by the group process of achievement. Out- 
side practice consisted of the application of the newly 
learned principles and skills to everyday reading. 
Provision was made for the use of personal reading 
material during the training sessions, and also for 
individual consultation as an additional aid to meet- 
ing the needs of the individual. 

Phase 4—Evaluation and follow-up. This phase in- 
cluded the measurements of reading ability upon the 
completion of training, and again 8 months after the 
completion of training. Measurement was effected by 
means of equated forms of the reading test which 
had been administered prior to training. 

This basic four-phase pattern was followed in each 
of the three experimental groups. Variation in method 
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between the three groups consisted primarily of use 
or lack of use of certain mechanical aids. 

Group A was trained with the aid of available 
commercial equipment. This included the Harvard 
Reading Films (Perry & Whitlock, 1949), the group 
tachistoscope, the Science Research Associates Read- 
ing Accelerator (Simpson, 1950), and the Renshaw 
individual tachistoscopic trainer (Renshaw, 1952). 

Group B was trained with the aid of only the 
group type commercial equipment. This included the 
group tachistoscope and the Harvard Films. 

Group C was trained without the aid of any com- 
mercial equipment. No mechanical reading devices 
were used at any time in the training of this group, 
and the training was centered around lectures, dis- 
cussions, practice reading, and paper and pencil exer- 
cises, 

Group D received no training in reading. They 
were exposed only to the pretesting and the follow-up 
testing stages of the program. It was found impos- 
sible because of business pressures to test this group 
at a time immediately following the training of the 
other groups. Group D was referred to as the con- 
trol group. 


RESULTS AND CONCLUSION 


Measurements included in this study were: 
(a) gain accomplished by the different groups 
during the training period in rate, comprehen- 
sion, and index; and (0) the retention of that 
gain. Table 1 presents reading score data for 
all methods at each period of training. 

The data were analyzed by means of the 
small sample ¢ test of significance. The ¢ tests 
between groups indicated that there were no 
significant differences between the three ex- 
perimental groups at any of the three periods 
of testing in rate, comprehension, and index. 
There was significance beyond the 1% level 
of confidence between all of the experimental 
groups and the control group on each factor 
for the second and third testing period. 

Table 2 presents the within ¢ ratios for each 
of the groups at the three periods of testing. 


TABLE 1 


MEAN Reapinc Test Scores For EAcH METHOD Prior TO TRAINING (1), IMMEDIATELY AFTER 
TRAINING (2), AND Ergot MONTHS AFTER TRAINING (3) 














Method A Method B Method C Control 

Criterion 1 2 3 1 2 5 1 2 5 1 3 
Rate (WPM) 263 458 437 255 457 439 268 486 488 268 257 
Comprehension 69 82 81 70 84 82 69 81 80 67 69 
Reading index 186 381 358 180 389 362 186 398 393 185 183 
Vocabulary 38 43 42 44 40 42 37 34 
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TABLE 2 


t Ratio Tests oF SIGNIFICANCE OF DIFFERENCES IN 
READING RATE, COMPREHENSION, AND INDEX WITHIN 
Eacn TRAINING Meruop AS A RESULT OF TRAINING 


Post 





Pre and Pre and and 
Criterion post follow-up follow-up 
Rate 
Group A 11.19 10,15 1.08 
Group B 7.96 8.47 93 
Group C 7.95 7.02 09 
Control _— ml _— 
Comprehension 
Group A 4.33 4.36 10 
Group B 2.82 2.56 Lido 
Group C 3.88 3.42 1.14 
Control _— nl _— 
Index 
Group A 12.01 11.70 1.22 
Group B 11.76 WPA? 1.75 
Group C 9,32 8.27 .28 
Control _ 19 _ 


Table 2 shows that all three groups pro- 
gressed significantly in reading rate, reading 
comprehension, and reading index, and that 
the gains achieved were retained at least 8 
months after the completion of training. 
There was no significant change in the con- 
trol group. 

The results of this study have substantiated 
previous research in showing that a significant 
amount of improvement and retention in read- 
ing ability may be obtained in a short train- 
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ing period by any one of several methods of 
training (Acker, 1955 unpublished; Causey, 
1952; Colby & Tiffin, 1950; Henry & Lauer, 
1939; Miller, 1950; Mullins & Mowry, 1954). 
The research demonstrated that reading im- 
provement is a worthwhile endeavor in the 
area of management development, and that a 
significant expenditure on mechanical aids to 
reading for such a program is unnecessary. 
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PERSONALITY ITEM DIFFICULTY AND ACQUIESCENCE 


CHARLES HANLEY 
Michigan State University 


This study deals with relations between acquiescence and 3 measures of person- 
ality item difficulty: controversiality, response latency, and confidence in 
accuracy of answer. Median ratings of confidence in answers to 110 MMPI 
items correlated —.62 with a measure of item controversiality, confirming the 
hypothesis that controversial items tend to be difficult to answer. Low- 
confidence items elicited acquiescence. In a 2nd sample of Ss, items low in 
confidence took longer to answer than contrasting high-confidence items. The 
low-confidence long-latency items were affected by acquiescence; the others were 
not. Results show that acquiescence occurs with difficult rather than easy 
inventory material. Response latency and subjective confidence seem logically 


superior to controversiality as measures of item difficulty. 


Without much in the way of direct evi- 
dence, psychologists assume that the relation 
between item difficulty and acquiescence 
shown in true-false achievement tests (Cron- 
bach, 1946) also holds for personality in- 
ventories. By convention, difficult person- 
ality items are those receiving nearly equal 
proportions of endorsement and _ rejection. 
The response set, acquiescence, presumably 
affects items of this type, labeled ‘‘contro- 
versial” by Fricke (1957). 

Item difficulty influences the latency of a 
subject’s (S’s) response and his confidence in 
its correctness in the same way that difficult 
psychophysical comparisons lengthen judg- 
ment time and lower a judge’s confidence in 
his accuracy (Hanley, 1962). Difficulty in- 
creases in the psychophysical experiment as 
the stimuli to be compared approach physical 
identity—the judge cannot see, feel, hear, or 
otherwise sense the differences between them. 
The parallel case for inventory items involves 
“item ambiguity” (Goldberg, 1963), where 
the S is not clear as to what is asked. 

Items may be difficult, however, for two 
reasons absent in the psychophysical case. 
First, the S can understand an item well 
enough but still not know the answer, a 
case of “response ambiguity.” He may be 
clear, for example, about the meaning of the 
item “I am more sensitive than most other 
people,” yet not be sure he is or is not. 
Second, an item may be hard to answer be- 
cause of “response conflict” the S vacillates 
between giving the accurate answer or con- 
cealing it. 


While all three sources of item difficulty 
act on response latency, response conflict 
may not lower apparent confidence or shift 
the proportions of endorsement and rejection 
toward equality; a defensive, malingering, or 
plus-getting S will slant his confidence ratings 
as he does his answers. Item response am- 
biguity, on the other hand, lessen an S’s 
confidence in his answer. Because they allow 
a weak but systematic set to agree or disagree 
with statements in general to play a decisive 
role in determining responses, these two kinds 
of ambiguity ought to bring about the ap- 
pearance of acquiescence; items with long 
response latencies and low-confidence ratings 
should be especially liable to the effects of 
the response set. Using these two item char- 
acteristics, the present study classifies per- 
sonality items in order to test the standard 
hypothesis that acquiescence is more likely 
when items are difficult rather than easy to 
answer. 


SUBJECTIVE CONFIDENCE AND ACQUIESCENCE 


Procedure 


Some 117 female and 46 male students in an under- 
graduate psychology course at Michigan State Uni- 
versity in the winter of 1962 answered a mimeo- 
graphed inventory composed of 110 MMPI items. 
The inventory contains the first 55 even-numbered 
booklet items keyed “true” and the first 55 even- 
numbered items keyed “false” on the Hs, D, Hy, 
Pd, Pa, Pt, Sc, and Ma scales, skipping 4 for which 
both answers are keyed. The omissions leave an 
item pool reasonably representative of MMPI diag- 
nostic scale content with normal and_ pathological 
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answers evenly distributed over the true and false 
categories. 

The Ss recorded their responses on a five-alterna- 
tive IBM multiple-choice answer sheet. Each $ 
placed a plus sign over the number of an item to 
register true or circled it to indicate false. Then, 
before answering the next item, he rated his con- 
fidence in the accuracy of his answer by marking 
one of the five alternatives on the IBM form, 
using a scale ranging from one (It’s just a guess) 
to five (I’m positive). : 


Results 


Median ratings by these students range 
from 2.62 to 4.93. The difficulty of an item 
is shown by its median confidence rating. 
Item controversiality is given by l-p, where 
p is the proportion of Ss giving the more 
common response to the item. Measured in 
this manner, controversiality can range from 
a maximum of .50 to a minimum of zero. 
‘For the 110 items, median confidence cor- 
related —.62 with controversiality, confidence 
decreasing as controversiality increased. (A 
test of significance for this coefficient would 
be misleadingly conservative, because the 
sample contains a sizable proportion of the 
finite MMPI population of diagnostic scale 


items. ) 
Controversial items tend to be neutral in 
social desirability; noncontroversial items 


may be either high or low in desirability. For 
this reason, the social desirability of these 
items, as measured by Messick and Jackson 
(1961), was taken as an absolute difference 
from a neutral value of 5.00. This transformed 
measure correlated —.54 with controversiality 
and .47 with median confidence, neutral 
items tending to be controversial and low in 
confidence rating. The partial correlation be- 
tween confidence and controversiality, social 
desirability held constant, is —.49, demon- 
strating the confidence and controversiality 
are related when the linear effect of social 
desirability is removed. 

An earlier study (Hanley, 1962) found 
signs of a tie between item length and dif- 
ficulty. The correlation between item length 
and confidence rating in the study was only 
—.20, showing at best a slight tendency for 
longer items to have lower confidence ratings. 

The relation between confidence and acqui- 


CHARLES HANLEY 


escence appears in the contrast between re- 
sponses to a low-confidence scale consisting of 
32 items receiving median ratings less than 
4.03 and a high-confidence scale of 32 items 
rated higher than 4.58. Each scale has equal 
numbers of items scored true and false on 
the MMPI keys. When the two scales were 
scored for acquiescence by keying all items 
true, the low-confidence scale had a KR—20 
coefficient of .57, while the high-confidence 
coefficient was .10. Acquiescence, defined as 
a systematic individual tendency to use one 
of the two possible answers more often than 
the other, affects items of low rather than 
high confidence. 

When responses were scored according to 
the MMPI keys, the high-confidence KR-20 
coefficient rose to .51, while the low-confi- 
dence coefficient dropped to .27. There is 
nothing about the items sampled, therefore, 
that precludes their having internal consist- 
ency when they are scored properly. 


RESPONSE LATENCY AND ACQUIESCENCE 


Procedure 


Personality items differ in length, and Ss read them 
at different rates. The measurement of response 
latency for such items is complicated by the fact 
that it is not possible to subtract reading time from 
total time taken to read and respond to an item. 
Reading time, however, can be controlled by 
matching items for the number of words they 
contain. In the present case, response latencies were 
collected for two such matched lists of high- and 
low-confidence items. Each item on a list was 
matched with 1 containing exactly the same number 
of words on the other list. Items came from the 
110 rated in the preceding section of the study. 
The 14 high-confidence items have median ratings 
greater than 4.61; half are keyed true and half 
false on the MMPI scales. The 14 low-confidence 
items have medians less than 4.03; again, half are 
true and half false on the MMPI. Matching of 
length and balancing of keying make for short lists. 

Items on the two lists were randomly ordered with 
other MMPI material in a 94-item inventory shown 
individually, one item at a time, to 40 male and 
18 female Ss from an introductory psychology 
course at Michigan State University in the spring 
of 1963. Items were displayed in a viewing box 
described as a semiautomatic mock-up of a per- 
sonality testing device planned to resemble a teach- 
ing machine. The experimenter (£) indicated that 
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answers of normal persons to material presented in 
the apparatus were needed for comparison with 
responses obtained by standard paper-and-pencil 
technique. 


The S sat before 2 X 2 foot face of the viewing 
box, the remainder of which was screened from his 
view, and read the item inside the box through a 
2 X 34 inch aperture. Each item appeared shortly 
after the S pressed a “Ready” button that illumi- 
nated a light at the rear of the apparatus. On this 
signal, the E threw a switch which started a silent 
timer and simultaneously lit the interior of the 
box where the item became visible. When the S 
pressed either the “True” or the “False” button 
on the face of the box to register his answer, a 
corresponding answer light visible only to the E 
came on, the timer stopped, and the light within 
the box went off. The E then recorded the answer 
and its latency (to .01 second), turned a continu- 
ous paper tape to bring a new item into viewing 
position, reset the timer, and awaited the next 
ready signal. Usually the S was ready for an- 
other item a few seconds before the E was able 
to show it; otherwise the rate at which items 
appeared was under the viwer’s control. The Ss 
did not report awareness of the timing stage of 
the study. 


Results 


Differences in response latency for the two 
lists show in two ways. Within Ss, low- 
confidence items had longer average latency; 
of the 58 Ss, 45 took longer with the low- 
confidence list. Such a result, of course, could 
come about if just one low-confidence item 
was unusually difficult compared to its match. 
The second comparison, between the pairs of 
matched items, eliminates this possibility. For 
all 14 pairs, average latency was greater for 
the low-confidence items. Both results are 
highly significant. Items that yield low- 
confidence ratings take longer to answer. 

The tie between difficulty and acquiescence 
appears in these data. The KR-—20 coefficient 
for acquiescence to the high-confidence short- 
latency list was .06 compared to .53 for the 
low-confidence long-latency list. Controversi- 
ality is associated with confidence and latency 
for these items: the proportion of Ss giving 
the commoner answer to the average high- 
confidence item was .77 compared to .64 for 
the average low-confidence item, while in 12 
of the 14 pairs the low-confidence item was 
closer to .50 than its high-confidence match. 
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DISCUSSION 


Acquiescence in the personality inventory 
clearly relates to item difficulty as it is 
measured by item controversiality, response 
latency, and subjective confidence, but the 
latter three variables are hard to separate. 
Latency and confidence, for instance, should 
be bound to each other in the inventory just 
as they are in the psychophysical experiment, 
so that there will be few if any items for 
which confidence is high but latency long or 
confidence low and latency short. The con- 
nection between controversiality and these 
two measures, on the other hand, appears 
only when about as many Ss agree as dis- 
agree with difficult items. As Wiggins (1962) 
points out, “Items which are controversial for 
one group may not be so for another [p. 
226].”’ If acquiescence arises in a sample con- 
taining individuals who are mostly set to 
agree with difficult items, these items will be 
relatively low in controversiality. The same 
reduction in controversiality will hold in a 
sample mostly set to disagree. 

Inspection of the scattergram for the —.62 
correlation between controversiality and con- 
fidence indicates that the greatest departures 
from the general trend come from controver- 
sial items receiving high-confidence ratings. 
It is simple, of course, to think of items that 
can have high controversiality in certain 
samples, yet still be easy to answer. The 
item, “I smoke cigarettes regularly,” for ex- 
ample, is not hard to understand, but it will 
show high controversiality in any sample 
where half the Ss have the habit. Despite 
its controversiality, it should elicit high- 
confidence ratings, short response latencies 
and, presumably, little acquiescence. 

In the present study, the correlation be- 
tween acquiescence and subjective confidence 
is direct, but the connection between latency 
and acquiescence is not, because the items 
with long latency and high acquiescence were 
initially chosen on the basis of low-confidence 
ratings. Showing a direct relation between 
latency and acquiescence requires timing a 
pool of items of identical length, so that 
latency rather than confidence can be used 
to separate items into difficult and easy lists 
for which acquiescence then can be scored, 
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A LINEAR MODEL OF JOB SATISFACTION * 


AND PATRICIA CAIN SMITH 


University of Illinois Cornell University 


Data relevant to 5 separate areas of a worker’s job satisfaction (satisfaction 
with: work, pay, promotion opportunities, co-workers, and supervision) and 
6 independent variables (age, tenure on the job, tenure with the company, 
job level, salary, and salary desired minus salary received) were gathered 
from a sample of 185 male workers and 75 female workers employed in 2 plants 
of an electronics manufacturing firm in New England. Multiple-regression 
analyses were done on these data to determine the validity of two hypotheses 
of Herzberg that age and tenure bear U-shaped relationships to job satisfaction. 
No support was found for these hypotheses. For the male workers a linear 
model of job satisfaction predicted work and pay satisfaction. None of the 
other dependent variables for the male or female workers could be predicted 
significantly and consistently. An explanation based on discrepancies between 


expectations and environmental return is offered. 


In an extensive review of the literature on 
job satisfaction, Herzberg, Mausner, Peterson, 
and Capwell (1957) discussed the relation- 
ship of age, tenure, salary, and job level to 
a worker’s job satisfaction (or morale). They 
concluded that the available studies revealed 
definite and consistent relationships between 
these four variables and level of job satis- 
faction among many different samples of 
American workers. Their conclusions con- 
cerning these four factors are summarized 
briefly below: 


Age 


The results of 17 out of 23 studies on the job 
satisfaction of workers at various age levels present 
the following picture: morale is high when people 
start their first jobs; it goes down during the next 
few years, and remains at relatively low level; when 
workers are in their late twenties or early thirties 
morale begins to rise. This rise continues through the 


1 This study is based on a portion of a doctoral 
dissertation by the senior author submitted to the 
graduate school of Cornell University in partial ful- 
fillment of the requirements for the PhD. The 
dissertation was directed by the co-author. Both 
authors participated in collection of the data. 

2This study is part of the Cornell University 
Studies of Retirement Policies, financed by a grant 
from the Ford Foundation. The authors wish to 
express their gratitude to the cooperating companies 
who made their records available and contributed 
the time of their personnel to make these studies 
possible and to the interviewers who contributed 
their time to the validation studies. The authors 
would also like to thank Nancy Wiggins for her 
assistance in data analysis and for writing the 
necessary programs. 


remainder of the working career in most cases [p. 
Arial 
Tenure 


workers begin with high morale which drops dur- 
ing the first year of service and remains low for a 
number of years. As service increases, morale tends 
to go up [p. 13]. 


Job Level 


One unequivocal fact emerges from the studies of 
job satisfaction; the higher the level of occupation, 
the higher the morale [p. 20]. 


Salary 


High income has been found associated with high 
job satisfaction in follow-up studies of college gradu- 
ates [p. 23]. 


Thus Herzberg, et al. (1957) seem to be 
concluding there is a U-shaped function be- 
tween age or tenure and satisfaction, and a 
positive monotonic function for job level and 
satisfaction. Their conclusions regarding 
salary and satisfaction were more tentative 
but, at least for college graduates, they con- 
cluded there would be a positive monotonic 
relationship between salary and satisfaction. 

We feel, however, there are many unre- 
solved problems connected with these four 
conclusions which deserve to be stated and 
discussed. For example, what is the strength 
of the association between age or tenure and 
job satisfaction? Many of the studies re- 
viewed by Herzberg, et al. presented the 
data in the form of array means of job 
satisfaction as a function of age or tenure. 
There is an obvious absence of statistical 
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analyses of these data. Herzberg, et al. seem 
to have been willing to conclude the existence 
of a complex function of the form y= a+ 
b(x; — £)® on the basis of inspecting a col- 
lection of graphs. We have no way of knowing 
if a relationship truly exists or if the U-shaped 
relationships cited by Herzberg, et al. are 
merely the result of random fluctuation. 
Further, even if an association does indeed 
exist, we have no way of estimating its 
strength. 

There is also the problem of correlated 
predictor variables. With the rather close 
relationship which exists between age and 
tenure, it is perhaps fallacious to conclude 
that age per se is related to job satisfaction 
when age and tenure are both operating 
simultaneously on satisfaction level. Before 
we conclude that age bears a certain relation- 
ship to job satisfaction it would seem that we 
would first have to remove the effects of 
tenure. The same argument holds, mutatis 
mutandis, for the other correlated predictor 
variables such as salary and job level. 

An additional problem arises in connection 
with the nature of the job satisfaction meas- 
ures used. Most of the studies reviewed by 
Herzberg et al. (1957) employed a global or 
general morale measure. Later studies have 
indicated that job satisfaction is not an 
unidimensional variable but should be con- 
sidered as being made up of a number of 
factors or areas of satisfaction. The exact 
number and specification of the various com- 
ponents or factors of job satisfaction is un- 
important for the present discussion. What is 
important is that job satisfaction can no 
longer be considered as an unidimensional 
variable. Hence, one might hypothesize that 
the relationships cited by Herzberg, et al. 
would be true for some of these factors of 
job satisfaction but would not hold true for 
others. 

Finally, Herzberg, et al. stated these find- 
ings as general conclusions. No allowance was 
made for the influence of situational variables 
or sex differences in job satisfaction (Hulin 
& Smith, 1964). These findings may have 
been somewhat overgeneralized, and while 
they hold true for some segments of the 
population they would not be true for other 
segments. 
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This present investigation is intended to 
clarify many of the problems discused above. 
It also may provide an answer to some of the 
questions which have been raised. 


VARIABLES STUDIED 
Independent Variables 


Age (A). The age of each subject (S) 
was obtained from the personnel records. For 
the purposes of the statistical manipulations 
in this study, A was measured as deviations 
from the mean age of the group (x; — @). 

Age squared (A*). In order to test for the 
curvilinear (U-shaped) age components in job 
satisfaction, A?, measured as (x; — #)?, was 
used as one of the independent variables. If 
Herzberg is correct and age and satisfaction 
do indeed have a U-shaped relationship to 
each other, then A’, measured in this manner, 
should have an approximately linear relation- 
ship to satisfaction. It should also be noted 
that by measuring A as (x; — £) and A?, as 
(x; — £)?, we will reduce the linear correla- 
tion between these two predictor variables 
approximately from .95 to +.30. Thus we 
should be able to estimate independently both 
the linear and the quadratic age components 
of job satisfaction. 

Tenure on the job (JT). Measures of this 
variable were obtained by the workers’ re- 
sponses to the question “How long have you 
been working on your present job?” 

Tenure with the company (CT). It is 
unclear whether Herzberg, et al. (1957) were 
discussing a worker’s tenure with his present 
company or his tenure on his present job. 
Therefore, measures of both variables were 
obtained. Again to facilitate later statistical 
analyses this variables was measured as 
deviation from the group mean. 

Tenure with company square (CT). To 
test for the curvilinear (U-shaped) tenure 
components of job satisfaction, CT? was used 
as an additional predictor variable. The 
same manipulations used for A and A® were 
used for this variable to insure relative in- 
dependence of the linear and the quadratic 
components. 

Salary (S). Measures of the annual salary 
earned by ‘all of the workers in this sample 
were available from the company files. Thus, 
none of the usual random or systematic errors 
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connected with the self-reporting of salary 
should be present. 

Job level (JL). The job levels of each 
worker were also available from the company 
files in the form of job classification num- 
bers. The job levels available from these 
classification numbers ranged from 01 (un- 
skilled workers) to 22 (high-level executives) 
and formed what amounted to a continuous 
scale. 

Desired salary minus present salary (SD). 
There is a strong possibility that it is not a 
worker’s salary per se that affects his satisfac- 
tion, but rather the discrepancy between what 
he is earning (his present salary) and _ his 
salary aspirations (desired salary). A foreman 
earning $8,000 per year may not be more 
satisfied than an unskilled worker earning 
$4,800 per year since he could have a much 
higher aspiration level. To test this pos- 
sibility, the difference between what a worker 
said he would like to earn annually and what 
he actually did earn was obtained. It was felt 
that this measure might show a stronger 
relationship to satisfaction level than would 
actual salary earned. 


Dependent Variables 


Five separate aspects of job satisfaction are 
being measured in this study. No claim is 
made that these five aspects or areas are ex- 
haustive or that they are orthogonal. How- 
ever, they were chosen to be consistent with 
the previous factor analytic studies of job 
satisfaction (Ash, 1954; Astin, 1958; Baehr, 
1954; Wherry, 1954; etc.) and they have 
been shown to be discriminably different 
from each other by several samples of work- 
ers (Kendall, 1961; Kendall, Smith, Hulin, 
& Locke, 1963). In all probability they will 
not specify completely the general variate 
‘ob satisfaction” in spite of the fact that 
they have been found so consistently. 

These five areas of job satisfaction (satis- 
faction with actual work, with pay, with 
promotional opportunities, with supervision, 
and with co-workers) were measured by means 
of the Job Descriptive Index. This method 
of measuring job satisfaction has been shown 
(Kendall, 1961; Kendall, et al., 1963) to 
have adequate convergent and discriminant 
validity according to the Campbell-Fiske 
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(1959) model. For a complete description of 
the Job Descriptive Index the reader is re- 
ferred to Hulin, Smith, Kendall, and Locke 
(1963). 

In this study each of these aspects of job 
satisfaction will be considered as a separate 
dependent variable. For each group of work- 
ers we will have 10 different analyses. Each 
analysis will be concerned with predicting 
a different aspect of job satisfaction once from 
the total set of independent variables, and 
once from the linear subset of independent 
variables. 


SAMPLE 


The data presented in this study were 
obtained from 185 male workers and 75 
female workers employed in two different 
plants (A and B) of the same large electronic 
corporation located in a large metropolitan 
area in New England. The parent corporation, 
of which these two plants are a part, employs 
several thousand employees in units located 
throughout the United States. The firm has 
been expanding rapidly during the past few 
years. In general the company controls a siz- 
able share of the electronic equipment market. 
As in many other electronic firms, a fair 
proportion of its work is done under govern- 
ment contract. The workers were drawn from 
the company rolls in systematic sample by 
choosing every nth name. The only restric- 
tion placed on the sample was that approxi- 
mately equal numbers of workers were to 
be drawn from each age group. Neither of the 
age distributions differed significantly from 
an equal probability distribution (Plant A, 
bin P< a0 ye Blants Bul y2i 5.86;5p1< 
.70). These four groups of workers (Plant A 
men, Plant B men, Plant A women and 
Plant B women) will be analyzed as separate 
groups throughout the remainder of this 
study. It has been demonstrated that there 
are differences between the sexes in the level 
of job satisfaction (Hulin & Smith, 1964) 
and it is likely that there will be differences 
in the relationship of job satisfaction to the 
independent variable being investigated in 
this study. Therefore, any combining of 
groups of workers of different sex or from 
different plants would only obscure the rela- 
tionships which exist. 
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METHOD 


Multiple-regression analyses were used in order to re- 
late the independent variables being considered in this 
study to the dependent variables of the five aspects of 
job satisfaction. This type of analysis has the advant- 
age of providing an estimate of the relationship of one 
of the independent variables to job satisfaction while 
holding the other independent variables constant. If 
we are to conclude that age or tenure or salary is re- 
lated to job satisfaction, then it seems that we must 
control for the effects of the other correlated predictor 
variables. Further, we will be able to obtain an esti- 
mate of the extent to which the various aspects of job 
satisfaction are predictable and the extent to which 
they are differentially related to the independent vari- 
ables being considered in this investigation. 

In general the adequacy of the following two models 
of job satisfaction will be tested: 


Linear Model 
yi = Bo + BiA + BJT X BsCT + BsJL + BsS + BSD 
Complex Model 


yi = Bo + BiA + BoA? + B3JT + BsCT 
+ BsCT? + BeJL + 61S + BsSD 


yi =a worker’s satisfaction with his work, pay, pro- 
motions, supervision, or people on the job. 


A = age. 

JT =a worker’s tenure on his job. 

CT = a worker’s tenure with his company. 
JL = aworker’s job level. 

S =a worker’s salary. 


SD = the difference between a worker’s present 
salary and what he feels he needs to be satis- 
fied. 


If Herzberg is correct and age and tenure show strong 
curvilinear relationships to the various aspects of job 
satisfaction then the complex model described above 
should explain significantly more of the variance in the 
dependent variables than the linear model does. On the 
other hand if age and tenure are linearly related to job 
satisfaction then the addition of the A? and the CT? 
factors should result in control of no additional variance. 

It should also be noted that a significant difference 
between the two multiple correlation coefficients is a 
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necessary but not a sufficient condition for concluding 
the existence of U-shaped relationships between age or 
tenure and job satisfaction. We must obtain positive 
beta weights for these two variables as well as a signifi- 
cant difference between the two Rs before this conclu- 
sion can be said to be tenable. 

To test for the significance of the additional variance 
controlled by A? and CT? it is sufficient to test for the 
significance of the difference between the two coefficients 
of multiple correlation derived from the two different 
models. In order to make this test, a procedure given 
by Guilford (1956, p. 400) will be used. He states that 
the significance of the difference in the amount of vari- 
ance explained by the two sets of variables can be tested 
by means of an F ratio. The formula for this F ratio 
reads 


Fe (Re — R2)(N — m — 1) 
7 (1 — Re) (mi — me) 
df = (m1 — mz) -and (N — m — 1) 
where &; = multiple & with larger number of independ- 
ent variables. 


Ro= multiple R with the two curvilinear vari- 
ables omitted. 


m, = larger number of independent variables. 
ma = smaller number of independent variables. 
N = sample size. 


In this study we will be testing the hypothesis that Rs 
= Rs. In this case the significance levels required will 
have to be adjusted since we will be testing the hy- 
pothesis 20 times (once for each of the five areas of job 
satisfaction across the four different groups of workers) 
and the problem of multiple comparisons (Ryan, 1959) 
is obviously relevant. If we desire an experimentwise 
error rate of .05 we will test the differences obtained at 
the .0025 level of probability (.05/20). Even this pro- 
cedure will not correct completely for the multiple 
comparison since the dependent variables are not in- 
dependent of each other. 


RESULTS 


The results of the 20 multiple-regression 
analyses are given in Table 1 through 8. 
Twenty tests of the hypothesis that Rs = Rg 
(the multiple correlation based on eight 


TABLE 1 


STANDARD PARTIAL REGRESSION COEFFICIENTS (ALL VARIABLES) PLANT A MEN 


S SD Ae 





A ae Ci Jil CT? R p 
Work 166 .020 112 pi 294 — .080 068 —.185 490 <.01 
Pay — 193 116 261 190 218 052 269 — .223 469 <.01 
Promotions — .092 — .196 252 030 268 O87 279 — .146 Ora <.01 
Co-workers — 071 — .080 318 WS 041 084 284 —.321 349 nS 
Supervision 075 —.081 195 —.104 .183 015 063 —.291 208 ns 





Note.—n = 99, 
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TABLE 2 
STANDARD PARTIAL REGRESSION COEFFICIENTS (LINEAR VARIABLES) PLANT A MEN 
A jo Gas Ji S SD R p 
Work .207 034 — .042 092 307 — .087 472 <.01 
Pay — .093 124 057 .210 168 —.011 383 <.01 
Promotions 001 — .196 .109 007 .199 .016 282 ns 
Co-workers .046 — 064 035 161 .006 024 184 ns 
Supervision 128 —.057 — 044 —.145 .219 .016 179 ns 
Note.—n = 99, 
TABLE 3 
STANDARD PARTIAL REGRESSION COEFFICIENTS (ALL VARIABLES) PLANT B MEN 
A iar Gis Jz 5 SD A? Ga R p 
Work 188 082 —.031 403 061 —.061 031 011 507 <.01 
Pay 370 A91 — 185 267 275 — .059 194 — 032 045 <.01 
Promotions —.113 — .184 051 526 —.190 — .053 168 .080 496 <.01 
Co-workers 097 051 054 nod —.151 — .046 .289 122 325 ns 
Supervision 303 —.218 —.156 367 —.272 — 049 —.021 32, 339 ns 





Note.—n = 86. 


independent variables is equal to the multiple 
correlation based on only the six linear vari- 
ables) were made. The results of these 20 
tests are given in Table 9 in the form of 
F ratios. Three of the F ratios reached sig- 
nificance at the .05 level. None reached the 


0025 level of significance required for an 
hypothesiswise error rate of .05. Thus we are 
not able to reject the hypothesis that Rs = Re. 
In addition, it should be pointed out that 16 
of the 40 standard partial regression coeffi- 
cients associated with the curvilinear terms 


TABLE 4 


STANDARD PARTIAL REGRESSION COEFFICIENTS (LINEAR VARIABLES) PLANT B MEN 

















A Iz (Ee ie S SD R p 
Work 181 082 —.031 406 058 — .062 504 <.01 
Pay 347 046 —,222 27a .239 — .061 518 <0) 
Promotion —.152 — .186 .062 542 —.205 — .065 A46 <.01 
Co-workers 074. 046 — .029 354 ~ —,214 — .043 204 ns 
Supervision 284 —.217 — 101 382 — .250 — .060 SSD, ns 

Note.—n = 86. 
TABLE 5 


STANDARD PARTIAL REGRESSION COEFFICIENTS (ALL VARIABLES) PLANT A WOMEN 











A ae Gi JL 
Work 077 — 437 043 364 
Pay 240 —.188 046 — 077 
Promotion 048 — 235 — .093 — .270 
Co-workers AM — 7513 .764 O41 


Supervision 002 — 413 193 —,.055 


S SD A? Gi R p 

= 223 326 219 060 aoihS ns 
O20 eS 026 038 610 <.05 
mats A77 — .049 .288 364 ns 

= PAS — .097 250 —.099 A064 ns 

—3136 054 128 


— .184 396 ns 





Note.—n = 35, 
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TABLE 6 


STANDARD PARTIAL REGRESSION COEFFICIENTS (LINEAR VARIABLES) PLANT A WOMEN 























A sal Cn JL S SD R p 

Work 049 — 439 071 295 — 143 Joi 487 ns 
Pay Stil —.189 .068 — .068 22 — .296 582 <.01 

Promotions .058 — .242 .078 —.091 So 230 oud ns 

Co-workers 105 —.511 .698 — 123 —.112 —.198 432 ns 

Supervision —.017 — 408 .680 —.213 —.001 —.016 367 ns 

Note.—n = 35. 
TABLE 7 
STANDARD PARTIAL REGRESSION COEFFICIENTS (ALL VARIABLES) PLANT B WoMEN 

A yan Cr ye S SD A? CY R p 

Work — .076 —.159 .000 129 — .098 — .009 — 327 .270 339 ns 

Pay 051 — .238 20 302 — 048 —.172 —.159 — .065 410 ns 
Promotions .170 — 406 454 227 — .085 279 421 —.197 .603 <.05 

Co-workers — 239 018 073 ails — .022 O80 — .109 052 282 ns 

Supervision — .249 102 .067 .088 —.164 —.058 187 —.159 309 ns 





Note.—n = 40. 


were negative in sign. In order to confirm 
Herzberg’s hypothesis the differences in the 
multiple correlations must be significant and 
the beta weights must be positive. These two 
requirements are clearly not met in these data. 

Testing the hypotheses that the linear 


multiple correlations were equal to zero gave 
somewhat mixed results. In both of the sam- 
ples of male workers the multiple-correlation 
coefficients associated with work and pay 
satisfaction were significant (p< .01). In 
addition, in Plant A the multiple correlation 


TABLE 8 


STANDARD PARTIAL REGRESSION COEFFICIENTS (LINEAR VARIABLES) PLANT B WoMEN 











A on CT ma S SD R p 
Work .033 —.215 178 —.105 .072 .102 199 ns 
Pay 124 — .207 22 308 —.079 —.167 404 ns 
Promotions 013 — 378 325 438 —.216 als 498 <.05 
Co-workers — 198 011 107 058 012 107 266 ns 
Supervision —.311 136 — .037 -225 — 264 —.122 oe ns 

Note.—n = 40. 
TABLE 9 


F Ratios TESTING SIGNIFICANCE OF DIFFERENCE BETWEEN MULTIPLE-CORRELATION COEFFICIENTS 











Men Women 
Plant A Plant B Plant A Plant B 
Work 1.02 il <1 1.32 
Pay 4.23* 157 <1 <A 
Promotion 3.20* 2.40 1.07 2.82 
Co-worker Anois Deis) cea <a 
Supervision 1.93 <1 al <1 
df 2,90 2,77 2,26 2,01 


BD i< -09s 
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associated with promotion satisfaction is sig- 
nificant (p < .01). For the samples of female 
workers the multiple correlation for pay was 
significant in Plant A (p< .01) and the 
multiple correlation for promotions was sig- 
nificant in Plant B (p < .05). 

For those dependent variables which were 
significantly related to the linear model vari- 
ables, a double cross-validation was carried 
out. The beta weights obtained from Plant A 
were applied to the data from Plant B and 
vice versa. These operations yielded product- 
moment correlations of .431 (work satisfac- 
tion, Plant A), .440 (work satisfaction, Plant 
B), .290 (pay satisfaction, Plant A), and .400 
(pay satisfaction, Plant B). All of these cor- 
relations between an independently weighted 
sum of the predictor variables and the de- 
pendent satisfaction variables display the 
expected shrinkage but they are all significant 
(p <..0L). 


DISCUSSION 


Several of the questions which we raised at 
the beginning of this paper appear to have 
been given at least partial answers. First, the 
conclusions of Herzberg, et al. (1957) regard- 
ing the U-shaped relationship between age 
or tenure and satisfaction must be regarded 
with suspicion. Among the Ss investigated in 
this study we found no evidence to support 
this hypothesis. In the few cases where the 
quadratic components of age or tenure con- 
trolled more than a chance amount of the 
variance of the satisfaction measures, the 
standard partial regression coefficients as- 
sociated with A? and CT? had at least one 
negative sign. The lack of improved predict- 
ability with the U-shaped components does 
not preclude, however, the existence of a more 
complex curvilinear model. 

Secondly, in addition to having different 
levels of satisfaction, male and female work- 
ers also exhibit different relationships between 
our set of independent variables and the vari- 
ous areas of job satisfaction. In both samples 
of male workers, work and pay satisfaction 
were significantly related to the set of in- 
dependent variables used in the study. These 
same variables also could be predicted sig- 
nificantly by using the beta weights obtained 
from an independent set of data. In the 
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sample of female workers from Plant A, pay 
satisfaction could be predicted significantly 
better than chance from the independent vari- 
ables. In the sample of Plant B females, 
satisfaction with promotions could be pre- 
dicted significantly better than chance. These 
were the only two significant relations ob- 
tained for the female workers. It seems that 
as a minimum condition investigators must 
draw distinctions between male and female 
workers when discussing functional relation- 
ships between job satisfaction and other 
variables. 

Thirdly, it appears that there are indeed 
differences between the five dimensions of 
job satisfaction in their relationships to the 
set of independent variables used in this 
study. While pay and work satisfaction could 
be predicted significantly in both samples 
of male workers, satisfaction with co-workers 
and supervision could not be predicted sig- 
nificantly better than chance in either plant. 
This would seem to be more evidence that 
these five aspects of job satisfaction are in- 
deed different from each other both in terms 
of functional relationships and predictibility. 

Finally, when we examine the linear-regres- 
sion equations we find that there are certain 
substantive findings of interest. (In the fol- 
lowing sections we shall limit our comments 
to work and pay satisfaction for men since 
these appear to be the only two dependent 
variables consistently and significantly related 
to our predictor variables.) Age and CT tend 
to have positive weights indicating positive 
monotonic relationships. Job tenure tends to 
have a negative weight. This negative weight 
was interpreted as a suppressor effect. Job 
level and S both have positive weights while 
SD has a negative weight. All of these find- 
ings are in the predicted direction. We would 
regard these results as indicating that work- 
ing on a job involves a process of the workers 
adjusting their expectations to what the envi- 
ronment is likely to provide. The closer the 
agreement between these two variables (ex- 
pectation and environmental return) the more 
satisfaction should be experienced by the 
worker. We assume that the longer a worker 
has been on the job the more he knows what 
to expect from the job and the entire situa- 
tion. Concomitant with the changing level of 
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the discrepancy between expectations and 
environmental return we find that in addition 
the level of the return is increasing due to 
tenure-connected raises and promotions. We 
would argue, therefore, that an explanation 
based on linear relationships between dis- 
crepancies between expectations-return and 
tenure, and linear relationships between tenure 
and return would be sufficient to explain the 
findings of this study. 

It should also be pointed out that this 
sort of an explanation will fit the negative as 
well as the positive findings. Work and pay 
satisfaction were the only two dependent 
variables which showed consistent and sig- 
nificant relationships with the predictor vari- 
ables. The work a person does and the pay 
he receives are the two variables in the 
industrial situation which should be the most 
closely related to changes in a _person’s 
tenure and age. As a result, while the vari- 
ables we assumed would index changing dis- 
crepancy levels may in fact be so doing, the 
variables which are used to index changing 
levels of environmental return may do this 
only for actual work done and pay level. 

There are several methodological problems 
related to the interpretation of these data. It 
is very evident that we have oversampled 
the variables from the age-tenure domain 
and from the job level-salary domain. The 
problem of linear restraints (Cureton, 1951) 
precludes the possibility that A, CT, and JT 
will all receive positive beta weights for 
predicting work and pay satisfaction for male 
workers even though the age-tenure, variate 
is positively related to satisfaction. The same 
problem is concerned with the beta weights 
assigned to job level and pay and may ac- 
count for a great deal of the apparent 
instability of these weights. 

It seems likely that a linear model with 
three variates as predictors (age-tenure, job 
level-salary, and present salary minus de- 
sired salary) would account for nearly as 
much of the variance as the six variables used 
in this study. In addition, these three variates 
would have the advantage of being relatively 
independent from each other. 

The small number of plants and Ss investi- 
gated in this study also poses somewhat of a 
limitation. It is true, however, that the results 
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concerning the predictability of work and pay 
satisfaction among male workers, the lack 
of predictability for female workers, and 
the relative magnitude and sign of the beta 
weights assigned to the linear variables were 
supported in a sample of nearly 700 additional 
workers drawn from four additional com- 
panies located in the East and Midwest. 
This would appear to be evidence for the 
generality of the linear model being proposed 
in this paper. 
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REMOVING HALO FROM JOB EVALUATION 
FACTOR STRUCTURE 


JAMES H. MYERS 


School of Business Administration, University of Southern California 


Intercorrelations of job-evaluation ratings were factor analyzed under 2 con- 
ditions: (a) the original matrix, produced directly from raters’ initial evalua- 
tions, and (b) a “reduced” matrix resulting from partialling out job level from 
all original intercorrelations inthe hope of removing a general “halo” factor 
characteristically emerging from job-rating studies. Comparisons of factors 
from each matrix showed a definite reduction of halo in the “reduced” matrix, 
as well as more meaningful factor structures for most factors. 


In an earlier publication (Myers, 1958) the 
author presented an experimental study in- 
volving factor analyses of a specially con- 
structed “point” job-evaluation system de- 
veloped for a life insurance company. Eighty- 
two office jobs were evaluated by three 
raters on each of 17 job requirements or 
characteristics (e.g., mental requirements, 
experience requirements, physical demand). 
These job ratings were made under two sepa- 
rate conditions: (a) “unforced” or normal 
set, done without knowledge of a job’s actual 
value or level in the company hierarchy; and 
(b) “forced” ratings, in which “adjustments” 
in the unforced ratings were made where 
necessary to yield a total point score for each 
job which was commensurate with the actual 
standing of the job in the company. For 
purposes of analysis, judgments from the 
three raters were added together to produce a 
single combined evaluation for each job on 
each factor. 

The primary purpose of the above study 
was to determine the effects upon factor 
structure of “forcing” job-evaluation ratings 
to make the final point score for each job 
conform with the predetermined overall value 
of that job. Additionally, however, it was 
hoped that the original unforced ratings would 
yield a factor structure devoid of the general 
factor (halo) found in almost every published 
study of job ratings up to that time (Grant, 
1951; Howard & Schultz, 1952; Lawshe, 
1945; Lawshe & Alessi, 1946; Lawshe, 
Dudek, & Wilson, 1948; Lawshe & Maleski, 
1946; Lawshe & Satter, 1949; Lawshe & Wil- 
son, 1946; Rogers, 1946). 

Forced and unforced ratings were factor 


analyzed separately, and comparisons were 
made between the resulting factor structures 
and loadings to determine the effects of 
forcing job-evaluation ratings. Both types of 
ratings produced five factors, the principal 
one in each case being a general factor show- 
ing high loadings on nearly all job require- 
ments and correlating highly (7 ~.95) with 
actual job level (overall value of the job in 
the organizational hierarchy). The remaining 
factors were similar under both rating con- 
ditions, although factor loadings of job re- 
quirements showed some differences between 
forced and unforced ratings, particularly on 
the less well-defined factors which emerged. 

One of the goals of the above study was 
to discover the dimensionality underlying 
the job-evaluation system for jobs in the 
company where the study was done. It was 
hoped that unforced evaluations would re- 
move most of the “halo” effects from job 
ratings and thus reveal the more dasic dimen- 
sions underlying the system. However, pres- 
ence of the “general” factor with its high 
loadings on nearly all job requirements and 
extremely high loading on job level was dis- 
appointing in this regard, 

The question then arose as to the possibility 
of other means of removing, or at least 
substantially reducing, halo to see if some 
more fundamental factor structure could be 
found. One possibility would be to remove the 
general factor statistically, through partial 
correlation.’ Since job level was loaded heavily 
on the general factor which emerged (7™.95), 
fia — Tis Vos 
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TABLE 1 
INTERCORRELATIONS OF JOB RaATINGS—BOTH ORIGINAL AND REDUCED MATRICES 
Variable Variable number 
Number and name ft 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
1. Mental Requirements 48 64 50 03 33 49 23 —06 03 —20 26 -—09 -19 —19 
2. Frequency of Decision 87 59 42 —06 23 39 23 36 17 —03 34 -—O1 -—0O2 —17 
3. Difficulty of Judgment 92 92 33 10 26 38 24 15 22, —12 20 —O7 —14 —27 
4. Attention to Details 68 65 61 02 04 53 10 10 —03 —03 25 02 03 00 
5. Education 36 33 40 24 —30 -—13 —16 05 0S 16 —21 -—12 25 41 
6. Experience 79 77 79 46 16 22 13 03 -—12 —26 15 -—06 —24 —20 
7. Effects of Inaccurate Work 85 83 84 70 26 73 18 02 21 02 32 22 03 —02 
8. Review on Work 63 64 65 41 14 56 60 23 37 17 34 -—03 -—23 —11 
9. Persuasion 69 82 78 50 30 67 68 62 65 54 12 09 16 13 
10. Importance of Contacts 71 76 79 43 OS 61 74 68 88 68 08 20 14 —04 
11. Frequency of Contacts 52 59 58 35 38 43 Sih 53 79 84 O1 18 31 23 
12. Variety in Work 80 83 81 57 23 72 79 67 73 71 58 00 —08 —16 
13. Confidential Nature of Work 32 37 35 24 07 30 46 24 39 45 41 35 —10 02 
14. Working Conditions 02 11 06 10 28 —03 13 —10 20 19 31 07 —03 52 
15. Physical Demand —30 -—28 -—32 —13 27 -—30 -—20 -—23 -12 —21 00 —28 —08 46 
16. Job Level 86 88 91 55 40 80 82 63 82 81 68 84 42 13 —23 





Note.—Rounded from three places and decimals omitted. Original matrix in lower left, (below diagonal), reduced matrix in 


upper right (above diagonal) portion of table. 


it was felt that partialling out job level would 
be a rather direct approach to the removal 
of halo from the basic intercorrelation matrix. 
A refactoring of the resulting correlations 
might then produce a clearer definition of the 
underlying factor structure of the job-evalua- 
tion system. This article reports such an 
approach and compares the factor structures 
from the “original” correlation matrix (un- 
forced ratings) with those from the ‘‘reduced” 
correlation matrix (job level partialled out). 


TABLE 2 


PRINCIPAL COMPONENTS FACTOR 
LOADINGS—ORIGINAL MATRIX 














METHOD 


As stated above, jobs were originally evaluated 
on each of 17 job requirements or characteristics. 
However, two of these involved the supervision of 
other personnel, which was required in only about 
one third of the 82 jobs evaluated. Therefore, these 
two variables were deleted for purposes of the 
present study, leaving a 15-variable matrix on 
which all jobs were rated. A list of these job require- 
ments can be found in Table 1. The original inter- 


TABLE 3 


PRINCIPAL COMPONENTS FACTOR 
LoapINcs—REpDuUCED MATRIX 








Factor 

Variable 
number I II Til IV Vv VI h2 
1 —76 —18 32 —09 —1i5 —O7 .75 
2 —75 13 15 —12 02 20 66 
3 —75 —01 19 —32 —31 06 80 
4 —59 04 45 29 03 —16 66 
5 21 31 60 —35 —31 —21 76 
6 —44 —35 —16 —07 19 C2 aad 
ff —67 09 23 50 —O1 00 76 
8 —47 21 —39 —20 24 —40 67 
9 —25 74 —21 —20 06 30 79 
10 ~—27 79 .—34 —10 -—19 —O1 86 
11 06 85 —18 03 —01 —05 76 
12 —53 03. —13 16 54 —30 71 
13 —04 21 —26 71 —46 07 84 
14 26 50 50 09 34 23. 74 
15 37 42 53 14 26 08 69 





Factor 

Variable : 
number I II Ill IV Vv VI. VED hb? 
1 ile a ey le | RR Se 
2 —94) 05s — 1300 03° —07/)—05 90 
3 —94) —07,) —16 O45 — 08 1 Ov Oe 
4 —(0)/ ee Ol 39 oo 11 17 41 96 
5 —37 a 30 12) —6Saa—11 04 98 
6 — Sle 2 ie Le ol 07 -0O7 —43 90 
7 90 Ne 0S 02 4 08 OS O25 188) 
8 —74 —17 10 31 02 47 04 90 
9 —88 16 18 16 09° =—01 01 86 
10 —88 14 28 1.90 — O00 10 93 
11 —73 _- 37 41 20 00 —02 19 +92 
12 Soe OOS 02 11 04 —07 80 
13 —46 00 60, (—59- > =23)  —03' 5 07 97 
14 =—12 80 —08 —04 46 -25 —01 94 
15 26 78 —06 —15 00 44. —26 97 
Note.—Rounded from five places and decimals omitted. 


First seven factors only (highest loadings > .40). 


Note.—Rounded from five places and decimals omitted. 


First six factors only. 
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TABLE 4 TABLE 5 
RotTatep Factor LoApincs— ROTATED Factor LoApIncs— 

ORIGINAL MATRIX REDUCED MATRIX 

F a a 

Variable Pamemmemmnt es 9 6° 3 RO eel Neo Variable Pecter 
number A B e D E G H he number A B c D E F he 
1 a=30 pOme 0508 ote 33 92 1 - = Ss ce es 
2 Ome. OGM Clee tS 1 Ae.) 0 2 ae ss % a i 10% 
3 eeeesS «00. ~09 9-25, —18. 23° 94 A tee m= 27 13 66 
4 ees. 05. =08 -—00=03! 86 96 SO ee Ob Uo 38 
5 i716 «= 10s: Ot 04 461) 08 98 : =o —12 ©) 205 921-10 275 66 
6 =3 if 0. Si) ( S07 2 o7 5 21 OO 2920062 408 76 
7 =74a0F 30 07 = 260503 41 88 6 2) Ou () (aul ee 1() 81 —02 77 
8 —50 61 3 09 08 11 20 90 7 —65 00 13 46 10 29 76 
9 —58 * 67 Gee 1S 10 Oe 12 36 8 Bed! OS ee nT 61 67 

10 51 75 {ie —2 tae ONTOS 9 =16 : _an 
11 =25 S37 OO) 07 92 10 —10 e te me i < 19 
12 Bee 43 © 0  =—08 oO Bo 11 15 a7 24 2 oe 
13 ete 06) 94 010 = ON OS) 97 16 —23 09 76 
14 Bago ris") 19111 $06) =209/0") “27 | \n0e 294 12 eee Sui SOL OL B12 679.471 
15 p05. 29), Ol. f=—18. 89 w=04, 97 13 Oi iee Wii 10 4 88h 02 —11 84 
14 04 17 83 —07 —05 —06 74 
Note.—Rounded from five places and decimals omitted. 15 10 05 1902" = 209=08 ™ 69 


* Factors are identified in text. 


correlations (unforced ratings) are shown in the 
lower half (below the diagonal) of Table 1, while 
the reduced intercorrelations are shown in the 
upper half of the same table (above the diagonal). 

Factors were extracted from both matrices using a 
principal-components method programmed for the 
IBM 7090 computer by the Division of Biostatistics, 
‘School of Medicine, University of California at Los 
Angeles. Rotation was accomplished by computer, 
following the “varimax” criterion (Kaiser, 1959). 
‘Only the first seven factors emerging from the 
original matrix and the first six from the reduced 
matrix were rotated for the following reasons: (a) 
with the relatively small number of jobs rated 
(N= 82), factors with no loadings of more than 
30—.40 are of questionable meaningfulness; (0) 
partialling job level out of the original matrix might 
‘be regarded as “extracting a factor’ from this 
matrix (albeit not in the same manner as in principal 
components extraction), so that the reduced matrix 
with six factors extracted by factor analysis would 
‘compare with seven factors from the original matrix; 
(c) with the small number of variables (15), it 
would not seem reasonable that more than six or 
seven meaningful common factors should emerge; 
\(d) within reason, the total number of factors 
rotated would not seem to be of critical importance 
in the present article, since this is primarily a 
methodological study. So long as a comparable 
number and sequence of factors are treated in both 
cases, the results should be on an equivalent basis 
for purposes of determining the effects of partialling 
out job level. 

Since both factor extractions and rotations were 
done by computer using the same programs for each 
matrix, any differences which emerged could not 
have been due to subjectivity on the part of the 


Note.—Rounded from five places and decimals omitted. 
2 Factors are identified in text. 


investigator, but rather would be verifiable on an 
objective basis. The present study shows more fac- 
tors of higher loadings than did the original study 
(Myers, 1958) because unity was inserted in the 
diagonals of the correlation matrices for the present 
study, while the highest correlation in each row or 
column was used for the previous study. This was 
done to better highlight changes in factor structure 
which might occur in the present study. 


RESULTS 


Unrotated principal components factor 
loadings for both matrices are shown in Tables 
2 and 3. Rotated loadings are shown in Tables 
4 and 5. 


Interpretation of Factors 


Rather than interpreting all factors from 
both conditions separately and then making 
comparisons, original and reduced factors 
will be discussed together in the approximate 
order of importance (i.e., highest loading), in 
cases where both factors appear to be similar. 
The remaining factors not common to both 
conditions will be treated separately. Only 
job requirements with factor loadings in 
excess of .40 for either condition will normally 
be shown. 
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Factor A: Work COMPLEXITY 


Factor loadings 


Requirement Original Reduced 


6. Experience — 93 —.26 
1. Mental Requirements —.82 — 83 
3. Difficulty of Judgment —.81 — 80 
2. Frequency of Decisions —.79 —.71 
7. Effects of Inaccurate Work —.74 —.65 
12, Variety in Work —.74 — 24 
9, Persuasion —.58 —.10 
10. Importance of Contacts —,ol —.10 
8. Review on Work —.50 —.14 
4. Attention to Details — 43 — 69 


While the original factor resembled a gen- 
eral factor with high loadings on nearly all 
job requirements, partialling out job level de- 
creased loadings substantially for many of 
the requirements in the reduced matrix, so 
that this factor emerges more clearly as one 
involving the intrinsic complexity of the 
work assignment, including the mental and 
decision demands. The sharp drop in loading 
for the Experience requirement can be ex- 
plained by the fact that many staff jobs 
(medical technicians, actuaries, etc.) demand 
a higher order of mental ability rather than 
simply greater experience, so that complexity 
need not be (and apparently is not) syn- 
onymous with experience. Reduced loadings 
thus produce a more realistic and less gen- 
eral factor here. 


Factor B—INTERPERSONAL RELATIONS 
(NONSUPERVISORY) 


Factor loadings 


Original Reduced 
11. Frequency of Contacts 84 Wi 
10. Importance of Contacts LS 89 
9, Persuasion .67 84 
8. Review on Work .67 oil 
12. Variety in Work 43 .03 


Loadings from the reduced matrix more 
clearly define this factor as one involving 
interpersonal relations of a persuasive nature, 
such as those required of methods, systems, 
and research staff personnel (“‘selling” ideas to 
management), employee counselors, employ- 
ment interviewers, etc.” The relatively high 
~~ 2'These are not to be interpreted as “selling” in 
the sense of a life insurance agent; all of the jobs 


rated for this study were office jobs involving no 
direct selling to the public. 


James H. Myers 


loading on Review on Work (i.e., indicating 
a high degree of review) in the original factor 
does not seem reasonable for these types of 
jobs. 


Factor C—PuHysicaAL COMPONENTS 


Factor loadings 


Original Reduced 


14. Working Conditions 91 83 
15. Physical Demands -29 19 








The reduced factor brings together both 
requirements relating to the physical com- 
ponents of a job. While it is true that not 
all jobs with physical demands have poorer 
working conditions and vice versa, the 
original intercorrelation of these requirements 
was .46, higher than any correlation of either 
requirement with any other requirement. 
Typical jobs involving both extra physical 
demand and poorer working conditions in- 
clude IBM equipment operators (standing 
+ noise), duplicating equipment operators 
(standing + noise + odor), and conference 
representatives (extensive travel + standing, 
walking, late evening work while setting up 
and coordinating conferences). 


Factor D—CoNnFIDENTIAL ASPECTS 
Factor loadings 


Original Reduced 


—.94 88 
=—120 46 


13. Confidential Nature of Work 
7. Effects of Inaccurate Work 


This factor was similar under both con- 
ditions; however, the reduced factor brought 
in effects of inaccurate work to a greater 
extent than in the original matrix. Typical 
jobs include supervising nurse, employee 
benefits administrator, senior actuarial ste- 
nographer, etc. 


Factor E—ForMAL EDUCATION 


Factor loadings 


Original Reduced 
5. Education —.94 —.62 
6. Experience .02 81 


This apparently relates to the prior formal 
education required for a job. Some technical 
jobs (e.g., market and personnel researchers, 
nurses, dietitians) require formal training as 


REMOVING HALO IN JoB EVALUATION 


a condition of employment, so that this factor 
is something different than the Work Com- 
plexity factor. The opposite loading on Ex- 
perience for the reduced factor reflects the 
fact that as these technical jobs are rated 
higher on formal education requirements, 
they are often rated lower on the experience 
requirement, since knowledge required is 
gained through formal training rather than 
on-the-job experience. 


Factor F—Variety IN WorK 


lactor loadings 


Original Reduced 
12. Variety in Work 19 
8. Review on Work 61 
5. Education —.40 


This factor emerged from the reduced 
matrix only. Its meaning is not clear. Perhaps 
it reflects a tendency for the work in certain 
higher-level jobs to be reviewed in greater 
detail, although this does not seem too reason- 
able in many cases. The opposite loading on 
Education is somewhat confusing. 

The remaining factors from the original 
matrix are not clearly defined (Physical De- 
mand—Factor G, Attention to Details—Fac- 
tor H). The former factor (Physical Demand) 
became part of the Physical Components 
factor in the reduced matrix. 


DISCUSSION 


It can thus be seen that partialling out job 
level produced some interesting changes in 
the factors which emerged. In general, this 
approach “cleaned up” factor structures so 
that more reasonable and meaningful inter- 
pretations could be made for most factors. 
It also definitely reduced the halo effect. 
Since both extractions and rotations were 
done by computer, there was no possibility 
that the differences which emerged could have 
been due to subjectivity on the part of the 
investigator. 

The approach used here requires that the 
original matrix contain at least one variable 
which correlates highly with nearly all other 
variables. Partialling out such a_ variable 
would have the effect of removing the in- 
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fluence of halo from relationships between 
the remaining variables. Matrices without 
such a variable would not be able to use this 
approach, but such matrices would probably 
be in the minority in the domain of personnel 
or job ratings. 

The approach used in this study is not 
recommended as a substitute for established 
methods of producing effective ratings (i., 
proper construction of the rating form, train- 
ing of raters). In situations where the more 
conventional methods fail to produce mean- 
ingful factor configurations, however, the ap- 
proach described here may be of value. 
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TEAM-TRAINING EFFECTIVENESS UNDER 
VARIOUS CONDITIONS * 


JAMES C. NAYLOR anp GEORGE E. BRIGGS 
Ohio State University 


Transfer performance of 3-man teams was measured as a function of 2 system 
variables (task complexity and organization) and 1 training variable (skill 
level of a replacement for 1 of the team members) in a simulated radar con- 
trolled aerial intercept task. Each independent variable influenced team per- 
formance. Task complexity had a consistent effect across all transfer sessions 
with superior performance on the less complex task. Task organization influ- 
enced performance only after the replacement occurred with superior per- 
formance by teams organized to permit each S to work independently of 
(rather than interact with) his counterpart. The teams receiving a more highly 
trained replacement improved in performance immediately following; teams 
with a less skilled replacement actually deteriorated slightly but then recovered 


Aucust 1965 


in a subsequent work period. 


A team may be differentiated from that 
broader class of which it is really a subset, 
the small group, by defining it as a small 
group of individuals that is both structured 
and task oriented (Horrocks, Krug, & Heer- 
mann, 1960). Typically, this structure is for- 
mal and usually is a function of the task to 
be performed. In general, teams exist to per- 
form a given function. Without such an as- 
signed function there is no need for the team— 
it has no purpose or justification for existing. 

Since teams are by definition function ori- 
ented, it follows that a high degree of task 
proficiency must be attained by the team so 
the function can-be performed as well as pos- 
sible. Thus, the notion of team training be- 
comes a topic of relevance. Unfortunately, the 
use of a team as the unit of interest in a train- 
ing situation has generally been ignored as 
an interesting and important combination by 


1 This research was supported by the United States 
Navy under Contract No. N61339-1327, sponsored by 
the United States Naval Training Device Center, Port 
Washington, New York. Permission is granted for re- 
production, translation, publication, use, and disposal 
in whole or in part for any purpose of the United 
States Government. 
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those working with either team behavior or 
with training procedures. Ample evidence to 
this fact is illustrated by the paucity of infor- 
mation on the topic in such sources as Glanzer 
and Glaser (1959, 1961), Bass (1960), Mc- 
Grath (1962), Kelley and Thibaut (1954), 
and Sells (1961a, 1961b, 1961c, 1961d, 196le, 
1961f). Only a small cluster of studies may 
be found which can be considered directly 
relevant to team training, per se. Of these, 
there are two team-training research efforts of 
particular interest. 

Horrocks and his collaborators (Horrocks & 
Goyer, 1959; Horrocks, Heermann, & Krug, 
1961; Horrocks, Krug, & Heermann, 1960), 
in a series of studies concerned with training 
teams to perform decoding operations, found 
that (a) individual training was superior to 
team training, (d) individual competence was 
of the utmost importance, and (c) substitu- 
tion of team members did not affect team per- 
formance. These findings were somewhat con- 
tradicted by a second set of studies by Glaser, 
Klaus, and Egerman (1962) and Egerman, 
Klaus, and Glaser (1962). They required 
subjects (Ss) to estimate time intervals, and 
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correct team performance hinged upon the 
independent performance (estimates) of all 
team members, i.e., both needed to be cor- 
rect for the team to be considered correct. 
Their results showed that training individuals 
separately led to severe performance decre- 
ments in a team situation—an outcome they 
attributed in part to the reduced reinforce- 
ment schedule experienced by a team mem- 
ber in the situation which required all mem- 
bers to perform successfully to accomplish the 
task objective. These data also indicated that 
adding an additional team member led to a 
decrement in team performance. 

From a practical point of view, the problem 
of team composition certainly would seem im- 
portant. Few teams remain stable in size and 
membership over time; thus, the effect of add- 
ing, subtracting, or substituting new for old 
team members is worth establishing. The fact 
that Horrocks et al. (1961) found no effect 
on team performance when team substitutions 
were made, for example, may be a function 
of the simplicity of the decoding task their 
Ss were required to perform. It does seem 
logical that the effect of changing team mem- 
bership might be directly related to the 
amount of interaction required among team 
members. Further, variables which do not ap- 
pear important in tasks of low complexity 
often become quite potent in tasks which are 
highly complex due to the additional task in- 
duced stress involved (e.g., see Bahrick & 
Shelly, 1958). 

The present study, therefore, was designed 
to evaluate changing team membership where 
(a) the substitute member possessed varying 
amounts of task experience, and (0) the cri- 
terion task was varied in terms of its com- 
plexity and its organizational characteristics. 
The task was a simulation of a radar control 
of manned interceptors against target aircraft. 


MeEtTHOD 


Subjects. A total of 128 undergraduate males from 
Ohio State University served as Ss in the experiment. 
All were paid at the rate of one dollar for each 
experimental session (approximately 40 minutes in 
duration). Ninety-six of the Ss were organized into 
three-man teams comprised of two radar controllers 
(RC) and a supervisor coordinator (SC). These 
three-man teams served as the basic experimental 
unit in the study. Assignments of Ss to teams and 
of teams to conditions were random within the gen- 


James C. NAYLor AND GerorcE E. Briccs 


eral restriction that groups be filled equally across 
the data-collection period. Assignment to team roles 
was not random; rather, the SC within each team 
was selected by the experimenter (EZ) after 3 days 
of training. Selection was based on ability to co- 
ordinate the activity of the two RCs and on the 
ability to schedule work loads for the two RCs. The 
remaining 32 Ss were designated as replacement RCs 
and experienced training identical to the other RCs 
except that an E assistant served as the SC in each 
case. These Ss served as replacement team members 
in the third transfer session (see below). 

Apparatus. The Ss performed in two-task situa- 
tions, a training task and a transfer task. The basic 
training task closely resembled a game of checkers 
somewhat similar in concept to the system pretrain- 
ing game used successfully by Kinkade and Kidd 
(1962). Two such boards, constructed of 40 X 40- 
inch masonite, were employed along with 16 coded 
aircraft checkers. Each of these checkers was uniquely 
identified by a clock-code symbol painted on its 
upper surface (Schipper, Versace, Kraft, & McGuire, 
1956). There were 2,804 squares on each problem 
board. The transfer-task apparatus consisted of the 
OSU Electronic Radar Target Simulator (Hixson, 
Harter, Warren, & Cowan, 1954) with four 14-inch 
diameter cathode-ray tube (CRT) radar console dis- 
plays. This equipment is capable of generating up to 
32 simulated aircraft radar returns, with appropriate 
clock coding, on the CRT screen of each of the radar 
consoles; however, only 16 of the aircraft were used 
in the study. 

The purpose of the training-task situation was to 
provide a means for the initial acquisition of the 
cognitive, perceptual-motor, procedural, and coordi- 
native skills basic to performance in the more com- | 
plex transfer task. The essentials of the transfer task 
thus were abstracted in the form of a checkerlike 
training task. Half of the checkers (those with odd 
code numbers) were designated arbitrarily as “target” 
checkers, while even-numbered checkers were desig- 
nated as “interceptors.”’ All interceptors were under 
the direct control of the RCs and were moved every 
20 seconds by those team members. All targets were 
moved by an experimental assistant during game 
play. The basic task of the team was to intercept as 
many target checkers with the interceptor checkers 
as possible during each 40-minute session, where an 
intercept was defined as landing an interceptor upon 
a particular square of the checker board simultane- 
ously with the arrival of a target checker. 

Training occurred in four daily sessions. In the 
first session all groups were given an introduction 
consisting of (a) an explanation of the purpose of 
the experiment, the objective of the task (aerial in- 
terceptions), and the functions required of the op- 
erators; (b) the method of aircraft identification; 
(c) procedures for communication; and (d) prac- 
tice in moving the checker “aircraft.” The latter ac- 
tivity occurred throughout the following three ses- 
sions of training. 

Transfer consisted of four 40-minute sessions on 
the higher-fidelity task. As in training there were 
eight interceptor and eight target aircraft on the 
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radar displays at all times. The major difference be- 
tween training and the transfer task was in the way 
interceptions occurred: by discrete moves versus con- 
tinuous paths and by direct manipulation of checkers 
versus indirect adjustments via verbal commands to 
the “pilots” over a simulated radio channel. The 
pilots were specially trained assistants who imple- 
mented verbal commands on target generator con- 
soles. 

Experimental Design. There were three independ- 
ent variables manipulated in this study: task com- 
plexity, task organization, and relative training of a 
replacement RC. Task complexity occurred at two 
levels: 2-D in which an RC merely commanded speed 
and heading adjustments versus 3-D in which he 
commanded speed, heading, and altitude changes in 
the interceptor aircraft. Task organization also oc- 
curred at two levels: an independent condition in 
which each RC worked without coordination with 
the other RC; versus an interaction condition 
wherein the two RCs could trade off targets and in- 
terceptors thereby coordinating their operation. Re- 
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placement training was such that the replacement 
had received either 1 more or 1 less day of experi- 
ence on the transfer task than the man he replaced. 
The replacement occurred in the third transfer ses- 
sion. 

Procedure. In the training task each target entered 
along one of three flight paths, from the north (top) 
of the problem board, heading 135, 180, or 215 de- 
grees with no evasive action. The RCs could move 
a checker each 20 seconds in one of five directions: 
straight ahead or to the right or left 45 or 90 de- 
grees. In order to reverse heading for an interceptor, 
the RC had to make two successive 90-degree right 
or left turns from an initial heading, thus requiring 
40 seconds of game time. A successful intercept oc- 
curred when both target and interceptor checker 
landed simultaneously in a square or when the inter- 
ceptor blocked the path of a target checker. 

The SC assigned targets to each RC based on the 
availability of interceptors and the skill of the RC. 
The SC monitored the operation and called any im- 
pending problems to the attention of the RCs. Each 
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TABLE 1 


ANALYSES OF THE EFFICIENCY Data (FUEL CONSUMED PER SUCCESSFUL INTERCEPTION) 
OF THE TRANSFER TASK 








Sessions 1 and 2 


Sessions 2 and 3 Sessions 3 and 4 











Source df MS FP MS F MS F 
Task organization (O) 1 7,611 — 275,134 3.64 140,764 7.66% 
Replacement training (R) 1 1,382,521 2.68 116,894 1.54 2,705 — 
Task complexity (C) 1 OSTA OOM Dai (ae 829,872 10.99** 362,400 19.74** 
OxR 1 403,361 — 3,894 — 7,898 _— 
OXxXC 1 19,006 — 5,383 — 7,998 — 
RXC 1 96,754) ah 5,598 — 1,023 ae 
OF REE il 87,643 — 566 — 1,158 — 
Teams within groups (Ts/G) 24 514,764 75,490 18,356 
Sessions (S) 1 11,224,069 21.59** 6259520 wool 81,637 4.38* 
SEO 1 337,544 -- 30,238 — 602 — 
SXR 1 49,669 — O15 02 eeOLU9® 47 1452 oS 
Sex € 1 1,688,088 3.24 125,648 2.04 2,069 — 
SECOPaGE 1 223,235 — 5575 — 2,323 — 
SP Olah 1 419,085 — 66,670 1.08 8,531 —_— 
sxRxe¢ 1 Si 01g ame est ae 11,610) 
Sex OPG hea 1 65,620 — 258 — 1,742 — 
SX Ts/G 24 519,763 61,295 18,611 
*p <.05 

ED <.01 


RC was responsible for issuing verbal commands and 
then carrying out these actions on the problem board, 
e.g., “alpha four (an interceptor), turn to 225 de- 
grees, speed 600 knots, climb to 35,000.” 

In the transfer sessions the responsibilities of the 
SC and RCs were essentially the same except that 
the interceptor aircraft was maneuvered by pilots (at 
the target generators) upon commands from the RCs. 
A successful intercept occurred whenever an inter- 
ceptor obtained a separation of 2 nautical miles or 
less (and zero altitude separation under the 3-D con- 
dition) from its assigned target. The RCs announced 
a termination for each intercept (whether successful 
or not); and following measurements of separation 
of the two blips on the CRT by an experimental 
assistant, the interceptor was orbited and the target 
aircraft was re-entered from the north (top of the 
CRT) for another run on a different flight plan. 

The target aircraft maintained either 600 or 800 
knots at either 30,000 or 40,000 feet in the 3-D con- 
dition or at 35,000 feet in the 2-D condition, and 
they maintained one of 40 constant heading flight 
paths from the north. An RC called an interceptor 
out of orbit when a target was assigned; he selected 
a speed between 200 and 1,200 knots and any head- 
ing between O and 359 degrees. All interceptors 
turned right or left at 3 degrees/second. 

In the independent team organization condition 
the RC could see on the problem board (training) 
or the CRT (transfer) only the targets assigned to 
him and all four of his interceptors; a screen sepa- 
rated the two RCs. In the interaction condition the 


screen was removed and the RCs could see all air- 
craft by examining both displays. The SC could see 
all aircraft in both task organization conditions. 

Performance Measures. Two measures of system 
performance were obtained on the transfer task: 
amount of fuel consumed per session and the num- 
ber of successful interceptions per session. Fuel con- 
sumed was calculated from data on power settings 
and altitudes with reference to nonclassified charts 
relating fuel consumption to these two variables. For 
analysis purposes the amount of fuel was divided by 
the number of successful intercepts to provide an 
index of system efficiency per team per session. 

No data were recorded on the training task itself 
since primary interest was with regard to perform- 
ance on the transfer task. Each team received sum- 
mary feedback at the end of each transfer session on 
the number of successful intercepts. 


RESULTS 


The results, in terms of the efficiency index 
(fuel consumed per successful intercept), are 
summarized in Figure 1, and the analyses of 
these data are summarized in Table 1. 

Prereplacement. It may be noted from 
Table 1 that over Sessions 1 and 2 (perform- 
ance prior to replacement of an RC) only task 
complexity and sessions produced significant 
differences (p < .05). The latter is a practice 
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effect and is apparent throughout the entire 
transfer period. From Figure 1 it may be 
noted that the less complex or 2-D condition 
provided for more efficient performance than 
did the 3-D condition; thus, adding the re- 
quirement for altitude control (the 3-D con- 
dition) to the speed and heading control re- 
quirements of the 2-D condition resulted in a 
significantly inferior level of performance in 
the former condition. None of the other in- 
dependent variables were significant prior to 
replacement of an RC on Session 3. 

Replacement. The analysis in Table 1 for 
Sessions 2 and 3 was performed to determine 
the effects, if any, of the replacement of an 
RC during Session 3, Session 2 serving thereby 
as a comparison. As in the analysis for Ses- 
sions 1 and 2, there was a significant practice 
effect, performance under the less complex 
2-D condition was superior to that under the 
3-D condition, and neither of the other two 
independent variables (replacement training 
or task organization) appeared as a signifi- 
cant main effect. However, there was a signifi- 
cant interaction between replacement training 
and sessions. This interaction was defined by 
a marked improvement (48% from Session 2 
to 3) for those teams that received a replace- 
ment who had more training than the RC re- 
placed, while a minor (2%) decrement oc- 
curred in those teams which received an RC 
with less training than the man replaced. 

Postreplacement. The third analysis in 
Table 1 was performed on the data of Ses- 
sions 3 and 4 to examine the longer range 
effects, if any, of the independent variables 
following replacement of an RC. The results 
are comparable to those found in the prere- 
placement analysis with one important addi- 
tion: for the first time, task organization ex- 
erted a statistically significant influence on 
performance. The effect was superior per- 
formance by those teams which worked under 
the independent version of task organization 
compared to performance of the other teams 
in the interaction condition. 

The amount of replacement training did not 
differentially influence performance over the 
entire postreplacement period; thus, its effect 
was temporary, being manifested only at the 
time of replacement, Session 3 (see above). 
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DISCUSSION 


These results indicate that task complexity, 
task organization, and amount of replacement 
training each influenced team performance at 
one time or another during the transfer ses- 
sions. First, task complexity was the most 
potent and consistently influential variable 
affecting team performance. Second, task or- 
ganization did not become effectual until the 
system was stressed by replacing one of the 
two RCs, and thereafter this system variable 
continued to affect team performance signifi- 
cantly to the end of the transfer sessions. 
Third, the training variable (amount of re- 
placement training) was effective only during 
the session in which the replacement of an RC 
was accomplished (Session 3) and no lasting 
effect was noted across both Sessions 3 and 4 
as was found for the task-organization vari- 
able. 

Task Complexity. It was not surprising to 
find inferior performance for those teams 
working in the 3-D version of the transfer 
task because the requirement to control alti- 
tude placed a double burden on the RCs: not 
only did they control three rather than two 
dimensions (as in the 2-D condition), but 
also altitude information was not present on 
a visual display as was information for head- 
ing and speed control; instead, the RCs either 
had to remember altitude information or re- 
quest it from the interceptor pilots. In prac- 
tice, the RCs repeatedly requested altitude in- 
formation which clearly showed that it was 
more difficult to retrieve and/or use this in- 
formation than to obtain and use heading and 
speed information. 

Had altitude information been available on 
a visual display, it is conceivable that 3-D 
performance would not have been so inferior 
to that in the 2-D condition. However, as 
noted by Schipper, Kidd, Shelly, and Smode 
(1957), a continuous display of altitude in- 
formation did not improve performance on a 
simulated air traffic control task (using the 
same basic equipment as in the present study) 
over that with no altitude display. 

Task Organization. This variable influenced 
performance only after a replacement of one 
of the two RCs on Session 3. Once it became 
effective, however, this variable continued to 
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influence team performance throughout the 
remainder of the transfer sessions. It was 
found that superior performance occurred 
with an independently organized task, i.e., 
the two RCs were given no opportunity to 
observe one another or to communicate dur- 
ing each session. This finding agrees with the 
earlier results of Horrocks and his associates 
(Horrocks & Goyer, 1959; Horrocks et al., 
1960, 1961), but it is contrary to the results 
from two studies by Glaser et al. (1962) and 
Egerman et al. (1962). 

A possible explanation for the difference 
noted between these several results is with 
regard to the way feedback was provided: 
Glaser et al. (1962) required that all team 
members be correct in their responses prior 
to feedback, while in the present study and 
in the Horrocks research no such contingency 
was applied in the feedback loop. Thus, the 
present Ss could observe the effects of previ- 
ous responses simply by viewing their CRT 
displays, and each man’s feedback was in no 
way dependent upon the actions of others in 
the team. 

Actually, it is not surprising that the inter- 
action condition (the RCs being encouraged 
to observe one another’s displays and _ pro- 
vide assistance where possible) was inferior 
to the independently organized task. Kidd 
(1961) found that adding radar operators to 
an air traffic control team did not result in 
equal increments of productivity; thus, he 
concluded that the extra capacity to deal 
with the problem was offset by the need to 
interact among team members: interaction 
(verbal communications between operators) 
does not exist for single operators working 
alone, but “this requirement is superimposed 
on the normal demands of the task itself and 
leads to a proportionate reduction of exclu- 
sively task-directed behavior” for two- and 
three-man teams (p. 199). 

Replacement Training. This variable did 
not affect performance as markedly as antici- 
pated. The more highly trained replacement 
RCs did enable their teams to improve sig- 
nificantly from Session 2 to 3, while teams re- 
ceiving a replacement with less training than 
the RC he replaced deteriorated slightly over 
the same interval; however, the effect was not 
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lasting and by Session 4 the effect was not 
significant. 

This finding suggests that those teams re- 
ceiving a less well trained RC adapted rather 
quickly to the loss of a team member, and so 
it may be concluded that extra training of a 
replacement operator is not a necessary con- 
dition in the long run and that it is important 
only over the short term immediately follow- 
ing replacement of a team member. It would 
seem, then, that one could question the de- 
sirability of “extra” training for a replace- 
ment operator when “on the job” training can 
so quickly bring total team performance up 
to par following the stress of a replacement 
of one of the team members. 

This relatively weak effect of a replacement 
means that the current results in this regard 
fall intermediate to those of Glaser et al. 
(1962) and Egerman et al. (1962) who found 
a pronounced effect and of Horrocks et al. 
(1959, 1960, 1961) who found no effect of a 
replacement on team performance. The cur- 
rent data offer little or no explanation of the 
disparity in the previous studies; rather, they 
indicate that whatever effect there may be of 
replacing a team member, that effect is greater 
with less well trained replacement operators. 
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The detection task employed a 94-in. plan position indicator (PPI) and simu- 
lated targets. 30 Army trainees served as Ss. Each S performed the 9 combina- 
tions of viewing distance, (a) 6 ins., (b) 12 ins., (c) 18 ins., and, search area, 
(a) whole scope, (b) 4 scope, and (c) 14-diameter circle within the whole scope. 
A Treatments X Treatments X Subjects analysis of variance indicated signifi- 
cant main and interaction effects: as viewing distance increases, detection per- 
formance is degraded; as search area increases, detection performance is de- 
graded; optimum viewing distance when searching the whole scope is approxi- 


mately 12 ins., while optimum viewing distance for a small area ( 


44-in. diam- 


eter) within a larger area is 6 ins. or less. 


Search area and viewing distance have been 
shown to be related to radar target detection 
proficiency. Bartlett and Williams (1947) re- 
ported that target detectability at 6 inches 
was superior to detectability at 24 inches on 
both “dark” and “moderately bright” plan 
position indicator (PPI)-type radar scopes in 
a visibility task where target location is 
known. Craik and Macpherson (1945) found 
18 inches to be the optimum viewing distance 
on a 9-inch PPI scope in a search task where 
target location is not known. The effect of 
search area upon target detection has been 
surance” by Baker (1962) who states, 

‘“‘we can infer that the larger the area, i.e., 
the more area for angular separation of nn 
and point of fixation, the poorer the detect- 
ability threshold [p. 69].” 

Current Army Air Defense radar systems 
typically use a 10-inch PPI scope for initial 
target detection. The radar operator will nor- 
mally have three general search areas of con- 
cern: (a) the operator may know that the 
target is near a symbol placed on his scope by 
a fire distribution system, (b) the operator 
may be responsible for a certain sector of the 
defended area, or (c) the operator may be 
responsible for targets appearing anywhere on 
his PPI scope. 

The purpose of this study was to determine 
the effect upon target detectability of search 
areas and viewing distances commonly en- 
countered and/or within the practical limita- 
tions of current Army Air Defense systems. 
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METHOD 
Equipment 


The radar target detection task involved the use 
of a 10-inch P-7 PPI display with a simulated 
radar presentation. The radar display, though 
nominally 10 inches, presented a 9.25-inch display 
diameter. The simulation equipment was designed 
to simulate small cross-section targets as seen on 
current Army Air Defense radar systems and is 
described in greater detail by Wright, Frederickson, 
and Claflin (1964). The radar display was set in a 
background of light-surround which was illuminated 
at a level of 0.1 footlambert by an overhead, ex- 
tended, light source. 

The CRT display was adjusted as described by 
Baldwin and Wright (1963). This method of CRT 
adjustment provided an integrated display illumina- 
tion level of approximately 0.1 footlambert. 


Observers 


Thirty Army trainees served as observers. All had 
normal vision and had no prior experience as radar 
operators. 


Experimental Design 


Two observers were tested simultaneously in two 
identical observation booths. Ambient illumination 
was maintained at 0.1 footlambert. Each observer 
performed in each of the nine experimental condi- 
tions. The experimental conditions consisted of three 
viewing distances (6 inches, 12 inches, and 18 inches) 
and three search areas (whole scope, quarter scope, 
and symbol area), resulting in nine combinations of 
viewing distance and search area. The whole scope 
condition consisted of searching the entire 9.25-inch 
diameter viewing area. The quarter scope condition 
consisted of searching a specified quadrant of the 
whole scope. The symbol area consisted of searching 
a specified +}-inch diameter circle within the whole 
scope. 


RADAR TARGET DETECTION 


A nonmoving target was presented in random 
locations within the specified search area. The target 
amplitude was increased by 2-volt steps, each 
antenna rotation, until the observer detected the 
target. An antenna rotation rate of eight revolutions 
per minute was simulated. 

An examiner in each observation booth specified 
the search area, maintained the observer’s viewing 
distance with a rod of the specified length, and 
determined the correctness of the observer’s response. 

Fifteen targets were viewed at each of the three 
distances, five in each search area. The order of 
presentation for the three distances and the order 
of search area presentations within the distances 
were randomized for the 15 observer pairs. 


Observer’s Task 


The observer’s task was to detect the target as 
soon as possible. The observers were told when to 
begin the search, what area to search, and the 
viewing distance desired. As soon as the observer 
detected the target he pushed a handheld pushbutton 
which caused the target voltage to be printed on a 
paper tape. 

Observers were given detection practice prior to 
testing. Observers were given rest periods between 
the three sets of 15 target trials and a 10-minute dark 
adaptation period before each test session. 


Performance Measure 


The observer’s score for a given trial was the 
target voltage at the time the observer made a 
correct detection response. The observer continued on 
a given trial until a correct detection was attained. 
The observer’s score for a given experimental condi- 
tion was the average (mean) of the five trial scores 
within that condition. 


RESULTS 


Mean target detection scores for the nine 
treatment conditions are presented in Table 
i: 

A summary of the Treatments x Treat- 
ments X Subjects analysis of variance is pre- 
sented in Table 2. Both main effects and 


TABLE 1 


MEAN SIGNAL VOLTAGE AT TARGET DETECTION AS A 
FUNCTION OF SEARCH AREA AND VIEWING DISTANCE 





Search area 





Viewing One-fourth Whole 
distance Symbol scope scope 
6 inches 4.98 5.46 Said 
12 inches 5.03 Si52 5.74 
18 inches 5.16 5.56 5.80 


oc. 


TABLE 2 


ANALYSIS OF VARIANCE SUMMARY TABLE 











Source df MS if p 
Viewing distance (A) 2 255 8.793 <.01 
Search area (B) 2 11.761 420.036 <.01 
Subjects (C) 29 126 _- 

AB 4 051 3923 <201 
AC (error A) 58 .029 --- 
BC (error B) 58 .028 — 
ABC (error AB) 116 013 oa 
Total 269 — 





the interaction effect were significant at or 
beyond the .01 level. 

The significant main effects indicate that 
as viewing distance increases, detection per- 
formance is degraded, and as search area in- 
creases, detection performance is degraded. 

The significant interaction effect indicates 
that as search area increases, the optimum 
viewing distance increases. 


DIscuUSSION 


The significant interaction between search 
area and viewing distance obtained in this 
study offers an explanation concerning the 
differences in optimum PPI viewing distances 
obtained in earlier research efforts. The opti- 
mum 6-inch viewing distance reported by 
Bartlett and Williams (1947) was obtained 
on a visibility task which was similar to the 
symbol search area reported in this study. The 
optimum 18-inch viewing distance reported 
by Craik and Macpherson (1945) was ob- 
tained on a search task which was similar to 
the whole scope search area reported in this 
study. 

The present data indicated an optimum 
viewing distance of 12 inches or less in a 
search task on a P-7 PPI scope 9} inches in 
diameter. This is a shorter optimum viewing 
distance than obtained by Craik and Mac- 
pherson with a similar task. Bartlett and Wil- 
liams, however, also reported that noise tended 
to reduce the advantages of the shorter view- 
ing distances. It should also be noted that the 
target size used in the present study was very 
small. It is probable that differences in back- 
ground noise and target size account, at least 
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in part, for these different estimates of opti- 
mum whole scope viewing distances. 

The data obtained indicate that the effect 
of viewing distance, within a reasonable range, 
is primarily of academic interest. The effect 
of search area is a much more potent factor 
than either viewing distance alone, or the 
interaction of viewing distance and search 
area. 

The authors have observed that radar ob- 
servers tend to maintain a greater viewing 
distance while searching the whole scope, and 
reduce viewing distance when something at- 
tracts or requires their immediate attention. 
In general, radar observers probably closely 
approximate the 12-inch viewing distance 
when searching the whole scope and decrease 
viewing distance when attending to a smaller 
area of interest. 

Determination of optimum viewing dis- 
tances for various search areas for routine 
field use do not appear necessary, considering 
the magnitude of the viewing-distance factor 
and the observation that radar operators tend 
to perform in a manner approximating the 
desired function. 


CONCLUSIONS 


The results of this study indicate that dif- 
ferences in earlier research findings concern- 
ing optimum radar scope viewing distances 
are probably due to an interaction of search 
area and viewing distance. 


A. D. Wricut, E. W. FREDERICKSON, AND J. L. CLAFLIN 


The optimum viewing distance when search- 
ing a 9.25-inch P-7 PPI scope is approxi- 
mately 12 inches. The optimum viewing dis- 
tance when searching a small area (44-inch 
diameter) within a larger scope is 6 inches or 
less. 

Based upon observation by the authors 
that radar operators tend to approximate this 
search area by viewing distance function, and 
the small magnitude of the viewing distance 
factor, these research findings appear to be of 
academic rather than practical significance. 
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INCREASING PRODUCTIVITY IN PUBLIC 
WORKS MAINTENANCE: 


THE EFFECTS OF JOB ESTIMATES BASED ON 
ENGINEERED PERFORMANCE STANDARDS + 


G. K. TALLMADGE, JR. 


Planning Research Corporation, Los Angeles, California 


Data regarding the size of conventional labor hour estimates, estimates based on 
Engineered Performance Standards (EPS), and labor hours expended in com- 
pleting jobs were collected from a carefully selected sample of Navy Public 
Works offices. An analysis of 266 work orders sampled from 12 Navy ac- 
tivities revealed that EPS estimates were significantly lower than conventional 
estimates but that the difference diminished with time from between 35% to 
40% early in the EPS program (1958) to between 5% to 7% in 1963. This 
reduction was attributed to a learning effect which caused a lowering of con- 
ventional estimates as they were shown to be excessively high. Other analyses 
showed that estimated hours were consumed on the job regardless of how 
grossly they appeared to overestimate actual requirements, indicating that EPS 
utilization increased productivity by an amount roughly corresponding to the 


initial difference between EPS and conventional estimates. 


In 1956, the United States Navy, through 
its Bureau of Yards and Docks, initiated the 
use of Engineered Performance Standards 
(EPS) as a management tool for the control 
of Public Works maintenance. These stand- 
ards, currently published in 13 volumes each 
corresponding to a single trade or group of 
related trades, present data in the form of 
labor hours required to accomplish described 
operations, jobs, or tasks. It is estimated that 
there is EPS coverage for about 80% of all 
Public Works maintenance jobs which arise 
at Navy shore activities (Belle, 1964), and 
the standards are used by Planners and Esti- 
mators at these activities to estimate labor 
hour requirements for specific work orders. 

The major benefit expected from the EPS 


1This report is based on research conducted 
under contract NBy 32272 with the United States 
Naval Civil Engineering Laboratory, Pt. Hueneme, 
California and monitored by C. E. Parker. Permis- 
sion is granted for reproduction, translation, publi- 
cation, use, and disposal in whole or in part by or 
for any purpose of the United States Government. 
The opinions or conclusions expressed herein are 
those of the author and should not be construed as 
necessarily reflecting the views or endorsement of 
the Navy Department, the Bureau of Yards and 
Docks, or the Naval Civil Engineering Laboratory. 

The author gratefully acknowledges the assistance 
of H. M. Dye and E. E. Bean in planning and 
conducting the research described herein and of D. 
H. Mitchell, R. M. Mitchell, B. C. Phillips, and C. 
H. Wilmot in analyzing the data. 
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program was an increase in estimation ac- 
curacy over conventional estimation tech- 
niques. This type of accuracy increment 
would have significant implications for on- 
the-job productivity by enabling more effi- 
cient scheduling of work and minimizing over- 
estimates. This latter factor is of particular 
importance since, according to Parkinson’s 
Law, workers can be expected to consume as 
much time as is allocated to any job. 

Several informal evaluations of EPS effec- 
tiveness have been made by groups within spe- 
cific Naval Districts and by the Bureau. 
While these evaluations tended to show that 
EPS utilization increased on-the-job produc- 
tivity, the present study was undertaken to 
provide a more rigorous treatment of the 
problem. 

MerETHOD 

The study, of necessity, involved no experimenta- 
tion and was entirely dependent upon analysis of 
records kept at individual Navy activities, District 
Field Engineering Offices, and the Bureau of Yards 
and Docks. Members of the project team visited 
Field Engineering Offices in 4 of the 10 Naval Dis- 
tricts located in the continental United States and a 
total of seven activities in these districts to obtain 
relevant information. These sites were selected to 
provide a sample as representative as possible of 
Navy-wide conditions and covered a wide variety of 
sizes, missions, environments, and _ geographical 
locations. 

The analyses reported here were based on two 
distinct types of data. The first class of data con- 
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TABLE 1 


ANALYSIS OF VARIANCE OF THE RATIO OF 
EstIMATED TO AcTUAL LABOR Hours 











Source af QS MS F 
Activities (A) 11 1,419 129 211% 
Estimation techniques (EF) 1 9,902 9,902 162.32** 
Trades (T) 3 471 157 2.57 
AXE A 2,915 265 4.34% 
A xeT So 7,014 213 3.49% 
EB <a 3 1,267 422 6.92*% 
Exe aa 33 2,013 61 

Total 95 
pie 00s 
¥* D < 001. 


sisted of specific work orders and summary sta- 
tistics relevant to jobs for which both EPS and 
conventional estimates were made. These data were 
originally generated at the time EPS were installed 
at the various activities when Planners and Esti- 
mators, as part of their training, reestimated pre- 
viously completed work. This EPS reestimation was 
done without knowledge of either the original con- 
ventional estimate or the actual time expended in 
completing the job. Data of this type were obtained 
for a total of 266 jobs in the four primary trades 
(painting, plumbing, carpentry, and electrical main- 
tenance) from 12 Navy activities. These data were 
used in the study to establish the relative size of 
EPS as opposed to conventional labor hour esti- 
mates. 

The second major data source consisted of cur- 
rent tabulated reports of completed work orders. 
These reports presented, by work order, estimated 
and actual labor hours for each trade, and indicated 
whether the job had been estimated by EPS or by 
conventional methods. The reports were collected 
from five Navy activities. 

In sampling data from these reports, jobs which 
involved a change in scope were excluded, as were 
very small jobs (generally less than 10 hours). With 
these exceptions, data were sampled randomly by 
specific job order until 30 EPS and 30 conventionally 
estimated jobs were obtained for each of the four 
primary trades at each of the five activities. These 
data were analyzed to provide a comparison between 
the Performance Indexes [(Estimated Hours — Ac- 
tual Hours) /Actual Hours] achieved on EPS esti- 
mated jobs and those achieved on conventionally 
estimated jobs. 


RESULTS AND DISCUSSION 


Although differences in the size of estimates 
between the two estimation techniques were 
of primary interest in the first analysis, a 
three-dimensional analysis of variance tech- 
nique was employed. The results of this analy- 
sis are summarized in Table 1. Since, at some 
activities, only data summarized by trade 
were available, the analysis was based on a 
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single entry per cell. Furthermore, since the 
summarized data represented different num- 
bers and sizes of jobs, the dependent variable 
for the analysis was the ratio of estimated 
to actual hours rather than the estimated 
hours. 

Table 1 shows that the difference between 
estimation techniques was significant beyond 
the 0.001 level. The differences among activi- 
ties were also significant (p < 0.05) but were 
of little practical concern to this study. All 
three interaction effects were also significant 
and are deserving of separate discussion. 

The Activity by Trade interaction was es- 
sentially an among estimators effect since 
most jobs within a trade were estimated by a 
single Planner and Estimator at each activity. 
These differences, then, reflected differences in 
the “generosity” of Planners and Estimators 
in allocating time to specific work orders. 
This type of finding was not unexpected. 

The Estimation Technique by Trade inter- 
action reflected a smaller (but still statistic- 
ally significant) difference between estimation 
techniques for the painting trade than for 
the three other trades considered here. This 
finding was also expected as improvements in 
paints and painting devices, (brushes and rol- 
lers) since EPS were developed, have caused 
the painting standards to be spuriously high. 
This situation has been widely recognized in 
the Navy and was remarked on by Public 
Works maintenance personnel at most of the 
activities and Field Engineering Offices vis- 
ited in the course of the study. 

The significant Activity by Estimation 
Technique interaction was found to be one of 
the most meaningful results of the study. 
When considered across all activities, EPS 
estimates were found to be 20.4% lower than 
conventional estimates. The difference, how- 
ever, varied substantially from activity to 
activity, ranging from a low of 5.6% to a 
high of 37.5%. This variation among activi- 
ties was found to be significantly related to 
the date at which EPS were installed at the 
activities. 

A product-moment correlation coefficient of 
— 0.72 (p < .01) was obtained between date 
of EPS installation and the percentage dif- 
ference between EPS and conventional esti- 
mates when all 12 activities were considered. 
One Marine Corps activity had been included 
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REGRESSION OF X ON Y, 


30 


REGRESSION OF X 
ON Y, ALL ACTIVITIES 


nm 
oO 


PERCENT EPS LOWER THAN CONVENTIONAL 
ESTIMATE (Xx) 


1958 1959 1960 





I96l 


EXCLUDING A> 


eA> (MARINE CORPS ACTIVITY) 


1962 1963 1964 


EPS INSTALLATION DATE (Y) 


Fic. 1. Regression with time of the size difference between EPS and conventional labor hour estimates. 


in the sample and was highly deviant from 
the general trend. When this activity was 
excluded, a correlation coefficient of — 0.91 
(p< .01) was obtained. Regression lines 
based on these correlations are shown in Fig- 
ure 1 and illustrate clearly the decreasing 
difference between EPS and conventional es- 
timates as a function of time. 

The regression with time of the differences 
between estimation techniques was apparently 
due to a progressive lowering of conventional 
estimates since EPS time allocations were 
derived according to standard procedures 
from specifications which did not change 
during the period covered by the study. This 
lowering trend was attributed to a learning 
effect resulting from EPS utilization. As evi- 
dence accumulated that EPS estimates were 
substantially lower than conventional esti- 
mates, Field Engineering Office personnel 
disseminated this information to activities at 
which EPS had not yet been installed. Plan- 
ners and Estimators at these activities appar- 
ently then began to lower their conventional 
estimates so that, when EPS were installed, 
the size difference between estimation tech- 
niques was less than it had been earlier in the 
program. 

The lowering of labor-hour estimates did 
not, of itself, represent an increase in produc- 


tivity. It remained to demonstrate: (a) that 
the estimated time was consumed on the job 
regardless of how high the estimates were, 
and (0) that the difference between estimated 
and actual hours was no greater for EPS esti- 
mated jobs than for conventionally estimated 
jobs. 

Performance Indexes based on conventional 
estimates were calculated from the summary 
statistics obtained for the 12 activities dis- 
cussed above. These indexes were then cor- 
related with EPS installation date in the same 
manner as had been done with the size dif- 
ference between EPS and conventional esti- 
mates. The obtained correlation coefficient of 
— 0.19 was statistically nonsignificant (p > 
.25). The regression equation based on this 
correlation coefficient indicated that conven- 
tional estimates exceeded actual hours by 4% 
in 1958 and that there was no difference be- 
tween estimated and actual hours in 1964. 
This 4% increment was not significantly 
different from zero either statistically or from 
a practical viewpoint. 

At this point in the study it could be said 
that, if EPS estimates reflected actual time 
requirements to complete specified jobs, then 
these requirements were overestimated by 
conventional estimation techniques. Early in 
the EPS program the extent of this overesti- 


236 


TABLE 2 


COMPARISON OF PERFORMANCE INDEX Data ror EPS 
AND CONVENTIONALLY EstiMaATED JoBS 








Conventional EPS 


Condition M SD M SD Fe p 





0.04 0.35 2.41 0.10 
O23 02 7am0,12" 10750 


Algebraic values 0.07 0.36 
Absolute values 0.23 0,28 


“df = 1/1,198. 


mation was in the neighborhood of 35% to 
40% but it decreased to from 5% to 7% by 
the end of 1963. Furthermore, the estimated 
hours were consumed in completing jobs 
throughout this time period, indicating that 
productivity on conventionally estimated jobs 
had increased approximately 28% to 35% in 
6 years. 

It was apparent from this finding that job 
estimates, even of the conventional type, can 
affect behavior in such a way as to counteract 
the effects of Parkinson’s Law. This result is 
particularly significant in view of the recent 
study by Aronson and Gerard (in press) 
which showed that workers not only expand a 
piece of work to fit the allocated time but that 
they continue to expend excessive time on 
subsequent jobs of a similar nature even 
when provided with some incentive for per- 
forming quickly. 

The final portion of the study demon- 
strated that the difference between estimated 
and actual labor hours was no greater for 
EPS estimated jobs than for conventionally 
estimated jobs. Mean Performance Indexes 
were calculated for the 600 EPS and 600 
conventionally estimated jobs sampled from 
five activities. Both absolute and algebraic 
values were used for these calculations yield- 
ing, respectively, the mean estimation error 
and the mean tendency to overestimate or 
underestimate. These mean values, together 
with standard deviations and tests of the 
mean differences between EPS and conven- 
tionally estimated jobs, are presented in Table 
a 

As shown in Table 2, both estimation tech- 
niques showed a slight tendency to overesti- 
mate actual labor-hour expenditures. Although 
conventional estimates showed a greater tend- 
ency in this direction, the difference, as tested 
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by an F ratio, was not statistically significant 
(p> .10). Similarly, the two estimation 
techniques did not differ with respect to ab- 
solute estimation errors (p > .50). 


CONCLUSIONS 


Data were collected from a carefully se- 
lected sample of United States Navy sources 
for the purpose of analyzing the effects job 
estimates based on EPS have had on produc- 
tivity in Public Works maintenance. These 
data encompassed statistics regarding the size 
of EPS and conventional labor-hour estimates 
and the hours actually expended in complet- 
ing specific maintenance jobs. 

Three separate analyses showed (a) that 
EPS estimates are significantly lower than 
conventional estimates for the same work, 
(6) that estimated hours are consumed in 
completing jobs regardless of the size of the 
estimate, and (c) that the difference between 
estimated and actual labor hours was no 
greater for EPS than for conventionally esti- 
mated jobs. Based on these findings, it was 
concluded that EPS utilization increases pro- 
ductivity in Navy Public Works maintenance. 

The size difference between EPS and con- 
ventional estimates was found to have a sig- 
nificant linear relationship with the time 
which had elapsed since initiation of the EPS 
program. This relationship was attributed to 
a learning effect resulting from experience 
with EPS which caused a lowering of conven- 
tional labor-hour estimates. 

Based on the data sample analyzed, the 
productivity increment resulting from EPS 
utilization has been between 35% and 40% 
for all EPS estimated jobs since 1958. The 
learning effect EPS utilization has had on 
conventional estimates had resulted in a simi- 
lar productivity increment of between 28% 
and 35% on conventionally estimated jobs 
as of the end of 1963. 
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OF CHARACTER DISORDERS * 
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2 additional studies were reported which suggest that IQ mediates the expression 
of psychopathy, as measured by the Insolence Scale. Prior work has suggested 
that the scale was most valid among individuals with IQs of 100 or more. The 
1st study tested the hypothesis. that there would be a stronger relationship be- 
tween failure to complete high school and the Insolence Scale among more in- 
telligent than less intelligent Ss. Using Navy recruits as Ss, this hypothesis was 
supported (p <.05). In general, the Insolence Scale’s validity increased as the 
TQ level of sailors increased. In the 2nd study the assumption that the Insolence 
Scale is a measure of psychopathy was tested by correlating the scale with 
psychiatrists’ diagnosis of psychopathy among a hospitalized Navy psychiatric 
population. It was found that the Insolence Scale correlated .66 (p <.01) with 
psychiatrists’ diagnosis of psychopathy among the more intelligent patients. 
This correlation was —.04 among the less intelligent. Based upon additional 
experimental findings, it is suggested that character disorders with low IQ may 


exhibit psychopathic type behavior mainly in response to stress. 


There have been occasional reports that 
tests of behavior and character disorder, such 
as the Pd Scale of the MMPI, are most prog- 
nostic of psychopathic behaviors among per- 
sons who are above average in intelligence. 
Among persons with lower IQs, the validities 
of these tests have at times been reported as 
zero (Panton, 1960; Roessel, 1954). For in- 
stance, Roessel found that the Pd score of the 
MMPI was related to high school dropouts 
among higher IQ teenagers, but not among 
lower IQ teenagers. 

In a previous study (Kipnis, 1965) it was 
reported that an assumed measure of char- 
acter and behavior disorder * called the Inso- 


1From Bureau of Medicine and Surgery, Navy 
Department, Research task MR005.12-2005, Subtask 
1. The opinions and statements contained herein are 
the private ones of the writer and are not to be 
construed as official or reflecting the views of the 
Navy Department or the Naval Service at large. 

2 Charles Morris assisted in the gathering and 
analysis of the data. The cooperation of C. S. Mullin 
and T. H. Lewis of the National Naval Medical 
Center in carrying out the diagnostic study is grate- 
fully acknowledged. 

8 Character and behavior disorders are considered 
to be those underlying several forms of immature, 
acting-out behaviors which are at variance with 
common social norms, but generally fall short of 
outright illegality. Such behaviors are exhibited on 
occasion by individuals clinically labeled not only as 
passive-aggressive, but also as passive-dependent, 
emotionally unstable, or antisocial personalities. 


237 


lence Scale was predictive of the performance 
of Navy recruits who had intelligence scores 
roughly equivalent to IQs of 100 or better. 
However, the Insolence Scale was not related 
to the Navy performance of recruits with 
lower intelligence scores. The purpose of this 
report is to present additional findings which 
suggest that IQ mediates the expression of the 
behaviors of character disorders. Two main 
sets of data are presented. In the first study, 
the Insolence Scale was. administered to a 
sample of Navy recruits and the attempt was 
made to postdict whether the recruit had 
graduated from high school prior to joining 
the Navy. This criterion was used since it 
reflects one of the first demands placed upon 
teenagers by society. Failure to complete high 
school is generally acknowledged to have in- 
creasingly serious consequences for subsequent 
vocational adjustment and success. Thus de- 
tection of an interaction between intelligence, 
the Insolence Scale, and successful completion 
of high school would be of twofold value. 
First, it would provide cues as to the moti- 
vational correlates of high school drops among 
the more intelligent. Secondly, it would pro- 
vide additional support for the contention 
that measures of character disorder, such as 
the Insolence Scale, are most clearly related 
to performance among more intelligent per- 
sons. In the second study the Insolence Scale 
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was administered to a sample of psychiatric 
patients in an attempt to relate Insolence 
Scale scores to psychiatrists’ diagnosis of be- 
havior and character disorder. In this study, 
analysis was also done at several levels of 
intelligence in order to detect any possible 
moderating effects of IQ. 


HicH ScHoot Dropouts 


Procedure 


Subjects. The subjects (Ss) consisted of 193 Navy 
recruits who were awaiting vocational training at 
the Navy Training Center, Bainbridge, Maryland. 
These men had been recruited to serve as Ss for 
various research studies underway at the Medical 
Research Institute and were tested in groups of 8 to 
16 daily. At the time of testing the Ss had been in 
the Navy approximately 4 months, their ages 
ranged from 17 to 19 years, and they represented 
the middle range of general intelligence, with IQs 
from approximately 92 to 115. 


Test Variables 


Insolence 1 Scale. The development of this scale 
has been described in previous articles (Kipnis, 1965; 
Kipnis & Glickman, 1962). It consists of 27 self- 
description items, which if answered affirmatively 
would convey the impression of a physically active, 
aggressive, somewhat hostile and reckless person- 
ality, who early in life became independent of family 
and grade-school control. 

Insolence 2 Scale. Based upon the early results 
with the Insolence Scale, additional items were 
written which were similar to those previously 
found discriminating. These items were administered 
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to the six samples of Navy men described in 
Kipnis (1965). The Insolence 2 Scale is based upon 
an item analysis of these newer items against per- 
formance criteria and is composed of 41 items, 
including 25 from the original key. The present 
study represents an attempt to estimate the validity 
of this newer key. Test items and scoring keys 
for both Insolence Scale 1 and 2 can be obtained as 
indicated in the footnote. Both scales are keyed so 
that high scores denote men with character disorders. 

It should be noted that the Insolence 1 Scale does 
not contain items about experiences in high school. 
It does, however, contain an item concerned with 
bad conduct in grade school. The Insolence 2 Scale 
contains one item about high school experience: “In 
high school, did you help other students with their 
studies ?” 

General Classification Test (GCT). The GCT is 
used as a measure of verbal reasoning ability in 
the Navy’s Basic Test Battery and is used to classify 
enlisted men when they first enter the Navy. Scores 
for this sample were obtained from official Navy 
records, 

High School Dropouts. At the time of testing, each 
S filled out a supplementary sheet indicating his 
highest level of education prior to entering the 
Navy. All men who did not graduate from high 
school were classified as dropouts. A total of 28% 
of the sample were so classified. 


* Test items and scoring keys have been deposited 
with the American Documentation Institute. Order 
Document No. 8437 from the ADI Auxiliary Pub- 
lications Project, Photoduplication Service, Library 
of Congress, Washington, D. C. 20540. Remit in 
advance $1.75 for microfilm or $2.50 for photocopies 
and make checks payable to: Chief, Photoduplica- 
tion Service, Library of Congress. 


TABLE 1 


INSOLENCE 1 AND INSOLENCE 2 ScALES: MEANS, STANDARD DEVIATIONS, CORRELATIONS with GCT 
AND Hicn Scnoot Dropouts (By GCT Grovupincs AND FoR THE TOTAL SAMPLE) 








Scale M SD 


Percent- r biserial 
age of dropouts 
drop- versus 

N outs r GCT graduate 


eee 


Insolence 1 Scale 


High GCT 10.3 3.8 
Low GCT 10.6 3 
Total 10.5 3.6 
Insolence 2 Scale 
High GCT 17.8 6.3 
Low GCT 18.0 5.8 
Total 17.9 6.1 


94. 29% 14 — 49** 
99 27% Std —.25* 
193 28% 02 —, 38% 
94 29% 03 —.60** 
99 27% —.06 —.39** 
193 28% 01 — .56** 
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48 49 51 53 56 60 
OR | | | oR 
GCT LESS 50 52 Bo 59 MORE 
LEVEL (N=34) (N=35) (N=30) (N=39) (N=30) (N=25) 


Fic. 1. Biserial correlation of high-school dropouts with the Insolence 1 and 
Insolence 2 Scales, by level of intelligence. 


Since the Navy adopts a selective policy toward 
the enlistment of men in terms of their intelligence 
and educational attainment, it is doubtful if the 
dropouts in the current sample can be considered 
representative of dropouts throughout the country. 
For one thing, the correlation between intelligence 
(GCT) and the dropout-graduate criterion was zero 
for this sample. Several studies have reported that 
a moderate to strong correlation exists between in- 
telligence and successful completion of high school 
(Cook, 1956; Dresher, 1954). The lack of correla- 
tion in this sample no doubt reflects the Navy’s re- 
cruiting policy. Applications are not encouraged from 
men who show indications of low intelligence on 
Navy screening tests, and among whom one could 
expect to find the majority of high school dropouts. 


RESULTS 


The sample was split into high and low 
GCT groups, based upon a median GCT split 
of 53.° Within each group and for the total 
sample, biserial correlations were computed 
between the Insolence Scale and the dichoto- 
mous criterion of dropouts versus graduate. In 
addition, product-moment correlations were 

5 High GCT groupings approximately equivalent 


to IQs from 102 to 115; Low GCT groupings 
equivalent to IQs from 92 to 101. 


computed between GCT and the Insolence 
Scales. The main results are summarized in 
Table 1. 

It may be seen in Table 1 that there were 
stronger correlations between both Insolence 
Scales and completion of high school among 
High GCT Ss than among Low GCT Ss. For 
the Insolence 1 Scale the difference between 
the correlations of — .49 and — .25 was sig- 
nificant at the .06 level of confidence, using 
Fisher’s 2’ transformation. For the Insolence 
2 Scale the differences between the correla- 
tions of — .60 and — .38 was significant be- 
yond the .05 level of confidence. Thus as 
predicted the findings show that high Inso- 
lence Scale scores were more likely to be as- 
sociated with excessive failure rates among 
brighter individuals than among the less 
bright. 

The findings also indicate that the Inso- 
lence 2 Scale was a more valid measure than 
the Insolence 1 Scale. This is not too surpris- 
ing since it contains most of the original In- 
solence Scale items plus an additional set of 
items similar in content to those of the orig: 
inal key. 
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To determine if there were consistent rises 
in correlations as intelligence increased, the 
Ss were subdivided into six GCT groupings 
and biserial correlations were computed be- 
tween both Insolence Scales and the dropout- 
graduate criterion within each GCT grouping. 
There were from 25 to 39 cases within each 
group. Figure 1 shows these correlations, 
plotted as a function of GCT level. It can be 
observed that there was a general trend for 
the correlations to rise as the intelligence 
level of the group rose. Again it can be noted 
that the Insolence 2 Scale yielded higher cor- 
relations at almost all GCT levels. 


DISCUSSION 


Whether or not the Insolence Scale would 
be predictive of high school dropouts cannot, 
of course, be determined from the present 
study. It is quite possible that the very ex- 
perience of dropping out of high school helped 
form the kind of character structure indicated 
by the Insolence Scale. One would guess that 
if the Insolence Scale was predictive of drop- 
outs, the magnitude of the correlations would 
probably not be as high as those obtained 
through postdiction. It seems to be true that 
the “half-life” of a high validity coefficient is 
very short indeed. 

What is mainly of interest in the findings is 
the indication that failure in high school 
among the more intelligent was associated 
with high Insolence Scale scores, while such 
failures among the less intelligent were not 
as strongly related to Insolence Scale scores. 
These findings are similar to those of the 
prior study (Kipnis, 1965). They are also 
similar to those of Roessel (1954) who re- 
ported a predictive study of high school drop- 
outs using the MMPI. He found that among 
high IQ teenagers, scores on the Psychopathic 
Deviate, Schizophrenia, and Hypomania 
scales most sharply differentiated the dropout 
from the graduate. Less differentiation was 
found among low IQ teenagers. Other studies 
(Bass, Dunteman, Frye, Vidulich, & Wam- 
bach, 1963;-Goodstein & Heilbrun, 1962) 
have also reported the moderating effects of 
intelligence upon personality, although not 
with reference to high school performance. 
Finally, Panton (1960) found that the coded 
MMPI profiles of more intelligent prisoners 
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were associated with character disorders, 
whereas the profile of the average and below 
average IQ groups were found to be dom- 
inated by configuration usually associated 
with indices of neuroticism. 

Thus there is some evidence that differing 
personality factors may be related to behavior 
as IQ level changes. What is not clear is the 
basis for these differences. One suspects that 
if the basis were understood, a means for 
reducing motivational failures among the 
more intelligent might be found. Let us as- 
sume that individuals with high Insolence 
Scale scores have the same character struc- 
ture, regardless of their levels of intelligence. 
Some data supporting this assumption will 
be given in the next study. Then, the fact 
that the less intelligent high insolent indi- 
vidual did not fail high school suggests that 
high Insolence Scale scores per se are not 
indicative of a debilitating personality dis- 
order. Apparently if the environment contains 
adequate sources of stimulation, persons with 
character disorders will respond and make 
satisfactory adjustments. If this line of rea- 
soning is correct, then the major research 
problem becomes the identification of the 
kinds of environmental incentive conditions 
that will motivate the more intelligent char- 
acter disorder to strive. 


PREDICTION OF PSYCHIATRIC DIAGNOSIS 


The assumption that the Insolence Scale is 
a measure of character disorder has been 
based upon an analysis of the items in the 
Scale. In the present study this assumption 
was tested by correlating the Insolence Scale 
with psychiatrists’ diagnosis of psychopathy. 
Based upon the previous results with the 
Scale, the principal hypothesis was that the 
Insolence Scale would be related to diagnosis 
of behavior and character disorder among 
patients with IQs of 100 or more. 


Subjects 


Over a 6-month period, all patients admit- 
ted to the neuropsychiatric ward at the Na- 
tional Naval Medical Center, Bethesda, Mary- 
land were given the Insolence Scale, with the 
exception of those too acutely ill to be tested. 
The Insolence 2 scoring key was used for this 
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study.® A total of 58 patients were tested 
during this period. 

The National Naval Medical Center does 
not provide long-term custodial care for psy- 
chiatric patients. Rather, patients are ad- 
mitted for diagnostic evaluation and active 
treatment where indicated by the staff of 
resident psychiatrists. Diagnosis is based upon 
an average of 3 to 6 weeks of testing, inter- 
views, and observation of the patient. Final 
psychiatric diagnoses are based upon the con- 
sensus of a panel of psychiatrists who care- 
fully review each case. Based upon the panel’s 
diagnosis, patients are either sent to a Navy 
hospital for long-term treatment, or back to 
active duty, or are discharged from the Navy. 

The standard APA classification of mental 
disorders is used by the Navy in diagnosing 
psychiatric patients. For purposes of this 
study, all patients diagnosed as (a) emo- 
tionally unstable personalities, (b) passive- 
dependent or aggressive personalities, (c) 
aggressive personalities, (d) antisocial per- 
sonalities, were considered to represent be- 
havior and character disorders.’ All other 
diagnoses were considered less likely to be 
indicative of psychopathy and were grouped 
together as noncharacter disorders.® 


6The results using the Insolence 1 Scale were 
similar to those reported here, although at a slightly 
lower level of significance. 

7 Separate analyses of these four diagnostic groups 
yielded no differences in Insolence Scale scores. 

8 Diagnoses included in the Non-Character Dis- 
order Grouping consisted of: Schizophrenia, Schizoid 
Personality, Paranoid Personality, Compulsive Per- 
sonality, Psychoneurotic Depression, Anxiety Reac- 
sion, Encephalitis, Psychogenic gastrointestinal reac- 
ion, Acute situational reaction, Psychiatric observa- 
cion (periodic). 


TABLE 2 


MEANS AND SUMMARY OF ANALYSIS OF VARIANCE OF 
INSOLENCE ScALE Scores: SUBJECTS CLASSIFIED 
By Psycuratrists’ DIAGNosIs AND IQ 











Source df MS F 
Diagnosis (D) 1 118.25 3.56 
I 1 13 
Dx IQ 1 149.10 4.49* 
Error 54 33.24 

Total Si 


*> <.05. 
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TABLE 3 


Means OF INSOLENCE SCALE ScoRES: SUBJECTS 
CLASSIFIED By PsycutaTRIsts’ D1aGNosis AND IQ 














Diagnosis 
Psychopath Nonpsychopath 
x x 
High IQ 21.60 15.08 
Low IQ 18.18 18.56 
Analysis 


The data were analyzed as follows: Based 
upon their GCT scores, patients were dichoto- 
mized into a high IQ group (GCT = 51 plus) 
and a low IQ group (GCT = 50 or less). 
Thirty-eight of the patients were classified as 
High IQ and 20 as Low IQ. 

Based upon their diagnoses, patients were 
classified as character disorders or noncharac- 
ter disorders. Among the high IQ patients 25 
out of 38 were classified as character dis- 
orders; among the low IQ patients, 11 out of 
20 were so classified. 

The Insolence Scale scores were subjected 
to a two-way analysis of variance, using 
diagnosis as one factor, and IQ level as the 
second factor. A method described by Winer 
(1962) was used to adjust the analysis for 
unequal cell entries. 

Biserial correlations were also computed 
at each IQ level between the Insolence Scale 
and the dichotomous diagnostic criterion. 


Results 


Tables 2 and 3 give the mean Insolence 
Scale scores for patients jointly classified by 
IQ and diagnosis, and the results of the analy- 
sis of variance of these data. It can be seen 
that the interaction of IQ and diagnosis was 
reliable beyond the .05 level. Inspection of 
the cell means shows that Insolence Scale 
scores were related to psychiatrists’ diagnoses 
of character disorder among patients with IQs 
of approximately 100 or more, but not among 
patients with lower IQs. The magnitude of 
these relationships is indicated by the biserial 
correlation of .66 (p < .01) between the Inso- 
lence Scale and diagnosis of psychopathy 
among high IQ patients. The corresponding 
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biserial correlation among low IQ patients 
was — .04. 


DISCUSSION 


The results support the belief that the In- 
solence Scale is a measure of character dis- 
order among those with average IQs or better. 
As in the prior studies however, high Insolence 
Scale scores were not prognostic of this per- 
sonality structure among patients with lower 
IQs. 

Despite these findings, data from several 
sources suggest that there may be a basic 
similarity in character structure among per- 
sons with high Insolence Scale scores at all 
levels of IQ. This is most clearly revealed 
when persons with high Insolence Scale scores 
and low IQs are subjected to stress. Under 
these conditions it is not possible to distin- 
guish their behavior from the behavior of the 
more intelligent individual with high Insolence 
Scale scores. Thus, in a recent study which 
subjected Navy recruits to mild stress (Kip- 
nis & Wagner, in press), the Ss with high 
Insolence Scale scores at all levels of IQ were 
less cooperative with the experimenter and 
were less concerned about doing well on the 
experimental tasks than were the Ss with low 
Insolence Scale scores. Under conditions of no 
stress, however, there was the usual finding of 
an interaction between IQ and Insolence Scale 
scores. That is, only the Ss high in both IQ 
and Insolence were not cooperative and had 
lower task motivation. In a second study now 
being completed, the results also show that 
the Ss with high Insolence Scale scores at all 
levels of IQ react negatively when given a long 
and supposedly meaningless task to complete. 

As a final point, one may speculate that the 
usual lack of psychopathic behavior among 
persons with high Insolence Scale scores and 
low IQs is due to the continual negative rein- 
forcement they receive when attempting such 
antisocial behaviors. Because of their low 
intelligence and consequent poor planning 
abilities, they may have discovered that they 
cannot get away with illegal activities without 
getting caught. In turn, this lack of success 
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and the high costs of detection inhibit the 
more blatant expressions of psychopathy 
among low intelligent character disorders, ex- 
cept when they are stressed. Thorne (1959) 
has proposed a somewhat similar social learn- 
ing explanation in his discussion of the etiol- 
ogy of the psychopathic personality. The fore- 
going view differs from Thorne’s in assuming 
that the lack of reinforcement inhibits the 
open display of psychopathic behavior, rather 
than in preventing the development of this 
kind of character structure among persons 
with low IQs. 
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RISK-TAKING SET AND TARGET DETECTION PERFORMANCE 


GARY W. EVANS 


Human Resources Research Office, Fort Bliss, Texas + 


An experiment tested the hypothesis that an observer’s risk-taking set is 
related to his target-detection performance on a radar display. Ss were given 
an equal number of trials under neutral, risky, and cautious sets, where 


differential sets were produced by instructions. 


As hypothesized, when in- 


structed to adopt a risky set, Ss made earlier detections of targets and had a 
higher false-positive identification rate than the same Ss when instructed to 
adopt a cautious set. These findings support the contention that radar detection 
performance can be regarded as a decision task. 


Baldwin, Wright, and Lehr (1964) have 
shown that an operator’s psychological set, 
produced by instructions, can affect his target- 
detection performance on a radar display. Set, 
in this instance, was varied by providing dif- 
ferential information concerning the attributes 
of a target. They suggested that target-detec- 
tion performance can be regarded as a de- 
cision task where the observer must decide, 
for each scan of the radar antenna, if a target 
is present. Signal detectability theory (Swets, 
Tanner, & Birdsall, 1961) leads to the pre- 
diction that, if detection performance on a 
radar display is indeed a decision task, ob- 
servers under a “risky” set should make ear- 
lier detections of a target and more false 
alarms than observers under a ‘“‘cautious”’ set. 
An experiment was conducted to test the hy- 
pothesis that an observer’s risk-taking set is 
related to his target-detection performance on 
a radar display. 


METHOD 


Subjects and apparatus. The subjects (Ss) were 12 
employees of the Human Research Unit, Fort Bliss, 
Texas. The radar target simulator used in this experi- 
ment was an electromechanical device with a con- 
tinuous signal-to-noise amplitude (S/N) ratio read- 
out capability. The stimulator output was a video 
display presented on 10-inch P-7 cathode-ray tubes 
in two experimental booths. The display was a simu- 
lation of a Plan Position Indicator (PPI). A central 
clutter pattern was simulated which extended ap- 
proximately 1.5 inch from the center of the display. 
All targets were presented in a noise strobe and fol- 
lowed a radial track. The sweep rotation was fixed 
at 8 rpm. A 9-position rotary switch determined the 
range at which the target would be visible, and a 


1 Now with the Bureau of Child Research, Parsons 
State Hospital and Training Center, Parsons, Kansas. 


vernier adjustment determined the range at which 
the target began its inbound run. The azimuth of the 
target track was continuously varied from trial to 
trial and randomly selected for all tracks. 

Procedure. The S sat directly in front of, and about 
18 inches from the stimulated scope in a darkened, 
windowless room. The display was set into a back- 
ground of white surround illuminated at a level of 
0.1 foot candle by an overhead, extended, light source. 
The observers received instructions and knowledge of 
the results from an examiner who sat beside them. 

The S’s task was to monitor the scope and to lo- 
cate a target in the noise strobe as soon as the target 
became visible. Upon locating a target, the observer 
pushed a handheld “detection” switch and placed a 
pencil eraser over the hypothesized target location. 
He was then informed, by the examiner who had 
knowledge of the correct target location, whether his 
response was “correct” or “incorrect.” 

Each S performed under three risk-taking sets 
(neutral, cautious, and risky), created by instructions. 

Neutral. The S was instructed to indicate that he 
had detected a target as soon as he thought he saw 
a target. 

Cautious. The S was given the neutral instructions 
and, in addition, was told that in the event he was 
uncertain as to whether or not he had located the 
target, it was more important to avoid a false alarm 
than to make an early detection of the target. 

Risky. The S was given the neutral instructions 
and, in addition, was told that in the event he was 
uncertain as to whether or not he had located a 
target, it was more important to make an early de- 
tection of the target than to avoid a false alarm. 

Each S was run for 15 trials under each condition. 
Six Ss received 15 trials under the neutral condition, 
followed by 15 trials under the cautious condition, 
and 15 trials under the risky condition. The other six 
Ss received 15 trials under the neutral condition, fol- 
lowed by 15 trials under the risky condition, and 15 
trials under the cautious condition. The Ss were given 
a 5-minute break following each block of 15 trials. 

Two dependent variables were employed: the S/N 
ratio at the time of the detection response and false: 
positive identification rate. 
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RESULTS AND DiscUSSION 


The results of the analysis of the S/N ratios 
are presented in Table 1 and S/N means for 
the treatment conditions are presented in 
Table 2. The F for the risk orientation effect 
was highly significant (p < .005). This indi- 
cates that risk orientation is related to range 
of detection. As expected, Ss made detections 
most rapidly under the risky condition, and 
least rapidly under the cautious condition; 
performance under the neutral condition was 
intermediate. A subsequent analysis using 
Duncan’s new multiple-range test confirmed 
that the mean S/N ratio for each of the three 
conditions was reliably different from the 
other mean S/N ratios (p < .05). 

Mean S/N ratio for the six Ss who were 
under the risky condition immediately follow- 
ing the neutral session was compared with the 
mean S/N ratio for the six Ss who were un- 
der the cautious condition immediately fol- 
lowing the neutral session. The Ss under the 
risky condition had a significantly lower mean 
S/N ratio than the Ss under the cautious con- 
dition (Ff = 5.76" dj =1 and 10'.4.<..05). 

Another statistically significant effect was 
that of the treatment by order interaction (p 


TABLE 1 











ANALYSIS OF VARIANCE SUMMARY TABLE 
Source af aS MS PF 
Between Ss 11 67,273 
Order (O) 1 169 169 
Ss within groups (S) 10 67,104 6,710.4 
Within Ss 24 62,592 
Treatment (T) 2 28,216.67 14,108.33 12.7342%** 
tL xO Zo Ae el ee 6,108.57 5.5136* 
TxS 20 22,158.19 1,107.91 
kp < .05, 


ed S 01. 
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TABLE 2 
S/N Means For Risk ORIENTATION 
AND ORDER 
Order 

/ tisk orientation Neutral Risky Cautious Total 
Risky before Cautious 1.100 1.083 1.099 1.094 
Cautious before Risky 1.095 1.060 1.345 1.097 
Total 1.097 1.071 1.117 1.095 





< .02). Inspection revealed this interaction to 
be due to a tendency for Ss to improve some- 
what from session to session (that is, a prac- 
tice effect). 

False-positive identification rate was related 
in the expected manner to risk orientation. 
The Ss under the risky condition made a mean 
number of 7.08 false-positive identifications 
as opposed to a mean of .92 of such identifica- 
tions under the cautious condition. A paired-t 
analysis revealed this to be significant at the 
Od. level (f= 2,78, d= ah) ), 

The conclusion based on these findings is 
obvious. Risk-taking set affects an operator’s 
target-detection performance. In general, a 
high risk-taking set allows earlier detection 
of a target at the cost of increasing the prob- 
ability of a false-positive identification. These 
findings also support Baldwin’s et al. (1964) 
contention that radar detection performance 
can be regarded as a decision task. 
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SOME EFFECTS OF VIBRATION UPON 
VISUAL PERFORMANCE * 


J. P. DENNIS 
College of Technology, Portsmouth, England 


Experiments have been carried out in which the effects upon visual perform- 
ance of whole-body vibration have been compared with the effects of vibrating 
the visual object itself, At 6 eps, using similar angular displacements, vibration 
of the visual object was found to result in higher impairment of vision than 
vibration of the human subject. At 14, 19, and 27 cps the converse was found 
to be the case; results which support previous theories of resonance of eyeball 
or facial tissue to account for the sensitivity of visual performance to whole- 
body vibration at these higher frequencies, 


In a paper describing the effects of vibra- 
tion upon the human body, Coermann (1939) 
plotted curves showing impairment of sight 
per unit of head displacement. He described 
the curves as being very similar to resonance 
curves, with a peak value for most subjects 
between 50 and 80 cycles per second. Accord- 
ing to Lange and Coermann (1962), this res- 
onance of the eyeball may begin to take effect 
at 18 cycles per second. Crook (1947), experi- 
menting with the effects of vibrating the visual 
object (numerical material), found with con- 
stant amplitude of vibration that errors in- 
creased very considerably between 8 and 15.5 
cycles per second but then leveled off between 
15.5 and 30.5 cycles per second. In a later 
paper Crook (1950) accepted that the reso- 
nance characteristics of the eyeball were the 
most likely explanation of the apparent dif- 
ferences between the two sets of data. Guig- 
nard * suggested that resonance of the facial 
tissue would affect visual performance at 
frequencies above 16 cycles per second. 

The present experiments make direct com- 
parisons between the effects of whole-body 
vibration and those of vibration of the object. 
Altogether four experiments are described 
here. In the first experiment, results are re- 
ported from six subjects (Ss) who took part 
in an investigation on the effects of whole- 
body vibration on vision, Vertical displace- 
ment of the head was measured during per- 
formance at the visual task. The experimental 


1 This paper is based on a thesis submitted to the 
University of Bristol in fulfillment of the require- 
ments of the Master of Science degree. 

2J. C, Guignard, 1959, Unpublished Observations. 


design consisted basically of 12 frequency- 
amplitude combinations of movement of the 
vibration table. In the second experiment, the 
same six Ss were used in an investigation into 
the effects of vibration of the visual object. 
Six frequency conditions were used, with am- 
plitude levels intermediate to those previously 
measured at the head of the Ss in Experiment 
I. After analysis of these experimental data, 
two subsidiary experiments were carried out. 
New and different groups of Ss were used in 
both these experiments. The four experiments 
may be summarized as follows: 

1. The effects of whole-body vibration on 
visual performance. 

2. The effects of vibration of visual ma- 
terial upon visual performance, using the 
same group of Ss as used in Experiment I. 

3. Subsidiary experiment at 6 cycles per 
second, comparing effects of S and object vi- 
bration. New group of Ss. 

4, Subsidiary experiment at 19 cycles per 
second, comparing effects of S and object vi- 
bration. Further new group of Ss. 


METHOD 


Visual Task. The visual performance task con- 
sisted of printed numbers, chosen to simplify com- 
parison with Crook’s work. In the main trials 400 
numbers, photographed from a table of random 
numbers, were split into 10 groups. The order of 
presentation of the groups was randomized to mini- 
mize learning, Background reflectance value of the 
numbers was 0.1 footlambert. Numbers were seen at 
an average distance of approximately 10 feet 6 inches, 
giving a visual angle for the numbers of 4.4 minutes 
of arc. In the subsidiary trials a total of 300 num- 
bers were used for the visual task, and to reduce 
task difficulty, illumination conditions were improved 
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to give a background reflectance value of 0.2 foot- 
lambert. Times taken to complete reading the num- 
bers were recorded with a stopwatch. 

Accelerometer System. An accelerometer of a crys- 
tal type working through a simple d.c. amplifier 
and cathode follower was used to measure the verti- 
cal movement of the head. The accelerometer was 
fixed to a metal plate fastened to the top of a 
flying helmet. A correction factor to allow for the 
frequency response of accelerometer and its mount- 
ing was found to be necessary. The correction was 
derived from the response of the accelerometer sys- 
tem to a single tap. The response showed an ex- 
ponential decay when the helmet was worn tightly 
pulled down but with the strap undone. The decay 
was not exponential when the strap was fastened. In 
all experiments on whole-body vibration the chin 
strap was worn unfastened. Correction factors ob- 
tained from eight Ss (including four of the Ss used 
in Experiment I) were located within narrow limits, 
with a standard error of 1% at frequencies up to 
14 cycles per second, 2% at 19 cycles per second, 3% 
at 27 cycles per second, and 4% at 37 cycles per 
second. 

During the actual experimentation, peak to peak 
acceleration at the head was measured on an oscillo- 
scope. Previous experimentation, using photographic 
reproduction of continuous records, found only small 
variations in amplitude of head movement during 
sample periods of several seconds. A wide range of 
frequency-amplitude conditions of table movement 
was covered. The method of direct readings from 
an oscilloscope allowed acceleration readings to be 
made at any time during whole-body vibration. 
Measurements described in the results are based on 
three results per S at each condition of vibration. 
When the visual material itself was vibrated, the ac- 
celerometer was fixed to the base of the stand hold- 
ing the numbers. A smooth sinusoidal movement was 
secured at each required object-vibration condition 
through the use of padding material inserted be- 
tween the stand and the vibration table. 

Seating Conditions. During the whole-body vibra- 
tion experiments, Ss were seated on a tank seat, with 
padding of honeycomb construction covered with 


8D. W. Grieve, 1958, Personal Communication. 
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polyvinyl chloride leather cloth. The seat is shaped 
to support the body and to give a measure of lateral 
stability. The Ss were instructed to adopt a com- 
fortable sitting posture and, as far as possible, to 
remain in that posture during the experimental test. 

The seat was bolted to the vibration table, which 
is 2 feet wide and 5 feet long and is hinged at one 
end. Vertical displacement of the vibration table is 
obtained by adjustment of an eccentric cam (mini- 
mum peak to peak displacement 0.006 inch, maxi- 
mum 0.400 inch). The centre line of the seat was 
placed 4 feet from the hinge of the vibration table. 
The frequency limits of the vibration table lay be- 
tween 5 and 40 cycles per second. 


EXPERIMENT I 


Experimental Conditions and Statistical De- 
sign. A standard Latin square analysis of vari- 
ance was used in this experiment with six 
amplitudes of table vibration arranged in bal- 
anced order. Vibration conditions during an 
experimental session consisted of one of these 
six amplitudes experienced at two frequency 
levels to give peak to peak acceleration levels 
of approximately 0.5 g and 1.0 g, referred to 
as “light” and “heavy” vibration, respec- 
tively (peak to peak g= 47’af?/386, where 
g = 386 in/sec? (32.2 feet/sec”), a is peak to 
peak amplitude of vibration and f is forcing 
frequency (after Schmitz, 1959). In metric 
terms g = 980 cm/sec?. 

Table 1 gives the frequencies, and peak to 
peak amplitude and acceleration used in Ex- 
periment I. 

It was found that continuous running of the 
vibration table could not be maintained at the 


peak to peak level of 1 g at the higher fre- 


quencies, so these frequencies were reduced 
so that any risk of breakdown was avoided. 
In each session, performance at the visual 


TABLE 1 


EXPERIMENTAL VIBRATION CONDITIONS FOR WHOLE-BoDy VIBRATION EXPERIMENT 





Displacement amplitude 


Frequency Peak to Identification 


Light vibration 


Heavy vibration 


Frequency Peak to Identification 





Inches Millimeters cps peak g letters é cps peak g letters 
0.200 (5.1) 5 0.51 A Light 7 1.00 A Heavy 
0.100 (2.54) 7 0.50 BL 10 1.02 BH 
0,050 (1.27) 10 0.51 Ci 14 1,00 CE 
0.024 (0.61) 14 0.48 Die 19 0.88 DH 
0.012 (0,30) 19 0.44 EL 27 0.89 EH 
0.006 (0.15) 27 0.45 EL 37 0.83 Bu 
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TABLE 2 


PEAK TO PEAK AMPLITUDES OF HEAD MOVEMENT IN THE VERTICAL PLANE: IN INCHES 

















A B 


Cc D E iF 


== aL eee easenrmmmmmmemmeeneeeecormeenreemrmemeeeny eee ee 


Light vibration 
Table movement 








Frequency 5 7 10 14 19 Di 
Amplitude 0.200” 0.100 0.050 0.024 0.012 0.006 
Amplitude at head 0.210” 0.065 0.023 0.010 0.004 0.0012 
SE of mean 0.018 0.0038 0.0012 0.0011 0.0003 0.0001 
Heavy vibration 
Table movement 
Frequency 7 10 14 19 27 37 
Amplitude 0.200 0.100 0.050 0.024 0.012 0.006 
Amplitude at head 0.102 0.045 0.021 0.008 0.0024 0.0004 
SE of mean 0.049 0.004 0.0013 0.0007 0.0004 0.00005 
Note.—Means of six subjects. 
task was measured five times. Trials 1,3, and tween treatments was not significant 


5 were made in the static (nonvibration) con- 
dition. Visual performance at the light and 
heavy levels of vibration was measured during 
Trials 2 and 4. The order of light and heavy 
vibration within sessions was randomized over 
the experiments. 


EXPERIMENTAL RESULTS 


Amplitude of Head Movement. Peak to 
peak amplitudes of head movement in the 
vertical plane are given in Table 2, the cor- 
rection factor having been applied. The three 
readings of peak to peak acceleration were 
d0oled for each S at each condition. There 
was no evidence of any major alteration in 
1ead movement over these three readings, ex- 
sept at E H and F H where there was a mean 
‘eduction of 10% between the first and second 
‘eadings, but this did not approach statistical 
significance. 

Errors. On inspection the means of the three 
1onvibration scores differed very little (52.6, 
11.4, 49.3) and these nonvibration results 
vere pooled for any one session. Analyses of 
‘ovariance were carried out on the data. 
\nalysis of errors made under light vibration 
vith nonvibration as covariable gave between 
reatments as the only significant term 
p< 1%). A similar analysis of errors made 
inder heavy vibration gave between men 
p< 1%) as the only significant factor. Be- 


(p < 10%). The analysis of covariance on 
the data combined over light and heavy found 
significant differences between men (p < 1%), 
treatments (p < 1%), and between light and 
heavy (p<1%). It is more relevant to a 
later comparison between the effects of S and 
object vibration to establish significance levels 
for the differences between each specific vibra- 
tion condition and its related nonvibration 
score. Vibration and nonvibration means were 
compared, using SE of difference between vi- 
bration and nonvibration means of 4.97 com- 
puted from analysis of covariance on all re- 
sults. Table 3 gives observed mean errors for 
each condition of vibration. 

Vibration means which are larger by 10.4 
than their respective nonvibration means are 
significantly different from it at p< 5% 
level (a difference of 14.2 is at the p< 1% 
level). It can be seen that E L has no signifi- 


TABLE 3 


EXPERIMENT I: OBSERVED MEAN 
NUMBER OF ERRORS 








Treatment A B Cc D E F 


Nonvibration AGP Ole ROO Mm oS oe OdeT 5.1 
Light vibration 70.5 61.5 60.1 75.6 48.6 73.3 
Heavy vibration 88.0 72.3 76.8 77.1 71.1 70.6 





Note.—6 results/vibration mean: 18 results/nonvibration 
mean. 
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TABLE 4 


PERCENTAGE CHANGE IN Error AT EACH 
VIBRATION CONDITION OF EXPERIMENT I 








Treatment A B C D E F 


Light vibration Sil 19 20 42 —6 38 
Heavy vibration 88 40 S53 45 S7 MES 








cant effect, that BL and CL are of border- 
line significance, but that all other treatments 
are highly significant (p < 1%). 

Table 4 gives percentage change in error for 
each vibration condition based on the data in 
Table 3. The results in Table 4 taken in con- 
junction with the results on amplitude of head 
movement given in Table 2 show the sensi- 
tivity of visual performance to high frequency 
whole-body vibration. For instance, similar 
levels of decrement are found at BH (10 
cycles per second) and EH (27 cycles per 
second) although amplitude of head move- 
ment at the latter condition was approximately 
5% of the former. Highly significant changes 
in performance (p < 1%) are found at EH 
(27 cycles per second) and FH (37 cycles 
per second) with angular displacements at 
the head of 0.06 and 0.01 minute of arc, re- 
spectively. These angular movements are less 
than the size of the smallest foveal cone (0.2 
minute of arc). The significance of these 
findings will be discussed in greater detail 
when results from Experiment II on vibration 
of the object are described below. 

Response Times. Response times were simi- 
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larly pooled for the three nonvibration condi- 
tions (means being respectively 303, 289, 295 
seconds). Changes in the time due to vibration 
were negligible, averaging —2% under light 
vibration and +4% under heavy vibration. 


EXPERIMENT IJ. VIBRATION OF THE 
VISUAL OBJECT 


The same six Ss used previously formed the 
experimental group. One vibration condition 
was used at each session, so that a session con- 
sisted of 


No vibration vibration No vibration 


It was not possible to vibrate the object with 
a sinusoidal movement at 5 cycles per second, 
and it was necessary to use a slightly higher 
frequency. Table 5 gives the actual displace- 
ment and frequency conditions used. Ampli- 
tude of movement was measured using the 
accelerometer system. Viewing conditions, 
illumination, distance of object, etc., were 
the same as those used in the previous experi- 
ment. A Latin square design was used with 
balanced order of treatments. 


RESULTS OF EXPERIMENT II 


Errors. Mean errors before vibration (69.5) 
and after vibration (67.4) were pooled for any 
one session. Analysis of covariance found the 
between treatments term highly statistically 
significant (p < 0.1%); no other terms were 


significant. Table 6 gives observed mean num- 


ber of errors for Experiment II. The standard 
error of the difference between vibration and 


TABLE 5 


VIBRATION CoNnpITIONS USED FOR VIBRATION OF VISUAL Onyect EXPERIMENT 


J K M R V 





‘Treatment G 
Cycles per second 5.6 7 10 14 19 27 
Peak to peak amplitude (inches) 0.116 0.100 0.036 0.015 0.006 0.003 





TABLE 6 


- Experiment IL: OpserveD MrAn NuMBER OF ERRORS (VIBRATION OF OBJECT) 


a 


Treatment G 
Nonvibration mean NOue 
Vibration mean 198.5 
Percentage increase in errors 171 





Note,—Means of six subjects, 


J K M R 
68.0 69.0 67.4 67.6 65.4 

159.2 92.5 70,7 rou 71.3 

134 34 5 12 9 
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nonvibration means is 7.99. For any one con- 
dition differences between vibration and non- 
vibration means larger than 17.01 are sig- 
nificant at p< 5% (a difference of 23.57 is 
significant at the » < 1% level). Significant 
differences in performance due to vibration 
are found at G and J (p < 0.1%) and at K 
(p+1%). 

Response Times. Mean times for completion 
of visual task were 366 seconds before vibra- 
tion and 352 seconds after vibration. These 
times were pooled and an analysis of covari- 
ance was carried out on all the data. Both the 
between-men, and between-treatments terms 
were significant at the p < 5% level. Table 7 
gives mean response times for Experiment IT. 
The SE of the difference between vibration 
and nonvibration means is 15.3. Response 
times are significantly higher under vibration 
compared with nonvibration at G (p < 0.1%), 
J (p< 1%), and K (p < 5%). 


COMPARISON BETWEEN RESULTS OF WHOLE- 
Bopy VIBRATION (EXPERIMENT I) AND 
OF VIBRATION OF THE OBJECT 
(EXPERIMENT IT) 


The error rate under nonvibration is higher 
for Experiment II than for Experiment I 
(17% compared with 14%). This apparently 
paradoxical result is due to the improvement 
in error scores which, although not statisti- 
cally significant by the analysis of covariance, 
occurred steadily throughout Experiment I. In 
fact, nonvibration errors in the first session of 
the object vibration experiment were at a simi- 
lar level to those in the first session of the 
whole-body vibration experiment. The lack of 
a similar learning trend in the later experi- 
ment is probably due to the reduced number 
of times on which the visual task was admin- 
istered (18 times per S in Experiment IT com- 
pared with the 30 times per S in Experiment 
1). Due to the change in base line (nonvibra- 
tion) error, comparisons between results of the 
two experiments based on percentage increase 
aver nonvibration scores are avoided as far as 
aossible. Some direct comparisons of per- 
ventage change in error are made, but are gen- 
rally used to highlight areas for subsequent 
‘esearch. Otherwise, comparisons between 
these two experiments are made in terms of 
he significance of differences found between 
yarticular vibration and nonvibration means. 
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TABLE 7 


I-XPERIMENT IL: MEAN RESPONSE TIMES IN 
SECONDS (VIBRATION OF Opyrcr) 


‘Treatment G J K M R V 


Mean nonvibration 352 342 364 362 378 356 
Mean vibration 413 388 391 357 365 346 


In comparing the results of the two experi- 
ments, certain trends emerged in the data. 

1. At 7 cycles per second where peak to 
peak amplitudes of the object and at the S’s 
head were approximately equal at 0.100 inch 
(Experimental Conditions J and AH) increase 
in error was higher with vibration of the ob- 
ject (134%) than under vibration of the Ss 
(88%). It was not possible to measure the 
statistical significance of this difference, and 
it was decided to investigate more fully the 
situation in this frequency zone, using three 
amplitudes of vibration of both object and 
subject at a frequency of 6 cycles per second. 

2. At 10 cycles per second the increase in 
error was approximately equivalent for S vi- 
bration (BH, peak to peak amplitude of head 
movement 0.045 inch and error increase of 
40%) and object vibration (K, peak to peak 
amplitude of 0.036 inch and error increase 
of 34%). 

3. At 14 and 27 cycles per second (M and 
V) no significant increase in error was found 
with vibration of the object. In contrast, 
smaller amplitudes of head movement at these 
frequencies (DL and FL) resulted in sta- 
tistically highly significant increases in error. 
At 19 cycles per second results were ambigu- 
ous with a 12% increase in error at a peak to 
peak amplitude of 0.006 inch for vibration 
of object (R), and a negative effect (EL) 
and a 45% increase (DH) at peak to peak 
amplitudes of head movement of 0.004 inch 
and 0.008 inch, respectively. Accordingly a 
subsidiary experiment was carried out at this 
frequency, 


EXPERIMENT IIT. CoMPARISON or SUBJECT 
VIBRATION AND Opyect VIBRATION 
AO GES 


Experimental Conditions. Three peak to 
peak amplitudes of table vibration were used, 
0.07 inch (X), 0.13 inch (Y), and 0.19 
inch (Z). After appropriate experimentation, 
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ERRORS AND TASK TIMES UNDER SUBJECT AND OBJECT VIBRATION CoNnpDITIONS AT 6 CPS 
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TABLE 8 








Subject vibration 


Peak to peak 





Object vibration 





amplitude 
of head Time Peak to peak Time 

Treatment movement SE Errors (seconds) amplitude Errors (seconds) 
Before vibration — — Died 193 — 28.7 217 

xX 0.047 0.003 33.0 199 0.053 45.3 25 

Y 0.070 0.005 35.8 213 0.070 56.8 235 

Z 0.085 0.006 41.8 220 0.091 76.0 246 
After vibration — -- 26.2 197 — 30.5 188 





Note.—Means of six subjects. 


three peak to peak amplitudes of object vibra- 
tion were selected which would be similar to 
the peak to peak amplitude of head move- 
ment found under the conditions of whole- 
body vibration. Six Ss were used in a Latin 
square design with balanced orders for S and 
object vibration and for treatments. Two ses- 
sions were held for each S. A session con- 
sisted of 


Vibration of | Nonvibration 
object or S 
at the three 
levels in 
random or- 


der 


Nonvibration 


The visual task consisted of reading 300 num- 
bers. Because of the high error rate experi- 
enced in Experiment II at high amplitude 
low-frequency conditions of object vibration, 
the visual task was made easier by doubling 
illumination contditions so that background 
reflectance of the numbers equalled 0.2 foot- 
lamberts. 

Results of Experiment III. Mean errors and 
task times found in Experiment III are given 
in Table 8. Again, before and after vibration 
error scores were pooled. Since analysis of 
variance performed on the total error data 
found statistical significance in the interac- 
tions Ss X Sand Object vibration (p < 5%) 
and Amplitudes X Object and S vibration 
(p < 0.1%), it was necessary to analyze each 
treatment separately against nonvibration. 
Significance values for differences between 
means of S vibration and nonvibration were 
found to be X, and Y, not significant, Z; 
p <0.1%, and between means of object vi- 


bration and nonvibration X, p<5%, Yo 
p<1%, Z, p< 0.1%. In view of the dif- 
ference found between response times for be- 
fore and after vibration for object vibration, 
these data were not analyzed. Much higher 
errors were found under conditions of vibra- 
tion of the object than under S vibration. 
Approximately equivalent percentage increases 
in error of 56% and 53% were produced by 
a head peak to peak amplitude of 0.085 inch 
and an object peak to peak amplitude of 
0.053 inch, respectively. 


EXPERIMENT IV. COMPARISON OF SUBJECT 
AND OBJECT VIBRATION AT 19 CPS 


Two experimental conditions were used— 
one of vibration of the S and one of vibration 
of the object. For S vibration a vibration ta- 
ble peak to peak amplitude of 0.024 inch 
was used (similar to main experiment on S 
vibration). This was expected to result in a 
mean peak to peak amplitude of vertical head 
movement of 0.008 inch. In order to sub- 
stantiate the difference between the effects of 
the two modes of vibration felt to exist in 
this frequency zone, vibration of the object 
was set at 0.012 inch. Extrapolation of 
Crook’s data suggested that legibility would 
be affected only marginally at that amplitude 
at 19 cycles per second. Other experimental 
conditions were similar to the previous experl- 
ment (Experiment III). Six Ss were used with 
Latin square design with balanced order for 
the two experimental conditions of S and ob- 
ject vibration. Two sessions were held for 
each experimental S$ consisting of: 


No vibration vibration (whole No vibration 
body or object) 
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TABLE 9 
ERRORS AND TIMES FOR SUBJECT AND OBJECT VIBRATION AT 19 CPS. 
Subject vibration 
Object vibration 
Peak to peak 
amplitude M M 
of head M times Peaktopeak M times 
movement SE errors (seconds) amplitude errors (seconds) 
Before vibration a — Som 236 — 37.8 219 
Vibration 0.007 0.0007 63.3 261 0.012 38.2 DOA 
After vibration —- — 34.2 237 — 35.0 229 





Note.—Means of six subjects. 


Results of Experiment IV. Mean errors and 
task times found in Experiment IV are given 
in Table 9. A statistically highly significant 
increase in errors was found under whole- 
body vibration between the means of vibra- 
tion and nonvibration (p < 0.1%). Response 
times also showed a significant increase under 
subject vibration (p < 5%). Vibration of the 
object had no significant effects on errors or 
response times. 


THE MOVEMENT OF THE HEAD IN THE 
LATERAL AND FRONTAL PLANES 
DURING VIBRATION 


Before results on S and object vibration are 
summarized and compared, some account of 
the lateral and frontal movements of the head 
during whole-body vibration is desirable. Al- 
though Crook (1947) in his experiments on 
Vibration of the visual field found that the 
pattern of motion had no statistically signifi- 
cant effect, he suggested that if secondary fea- 
tures (amplitude, illumination, type size) had 
been more severe, circular motion would have 
had more unfavourable effects than linear mo- 
tion. This finding has no bearing on the ef- 
fects of circularity of motion where a frontal 
component is concerned, but it does suggest 
that if the lateral component in head move- 
ment is large enough it may be a factor ad- 
versely affecting visual performance. Measure- 
ments have been made on four Ss of head 
movement in all three planes. Over a range of 
vibration conditions similar to those used in 
Experiments I, III, IV, reported here, lateral 
movement of the head was of small and usu- 
ally negligible extent. In each record it was 
less than one third of the vertical movement 
of the head, except at 28 cycles per second 
with table amplitude of 0.009 inch peak to 


peak, when it was on average one half of the 
vertical. displacement. Frontal movement un- 
der these vibrations showed a more consistent 
relationship with the vertical displacement, 
generally being about 60% of the vertical 
amplitude, except in some half of the records 
at 5 and 7 cycles per second when the frontal 
component was not sinusoidal. 


DISCUSSION 


Comparison of the Effects of Whole-Body 
Vibration and Vibration of the Visual Object 
at Frequencies Below 10 cps. In Experiment 
I an increase of 88% is found at 7 cycles per 
second (AH) with mean vertical movement at 
the head of 0.102 inch (2.8 minutes of arc 
peak to peak angular displacement). In Ex- 
periment II with object movement (J) of 
0.100 inch (2.7 minutes of arc peak to peak) 
the increase in error of 134% was much 
higher. The subsequent experiment at 6 cycles 
per second confirmed the difference between 
the effects of S vibration and object vibration 
in this frequency range. With mean movement 
of the head of 0.047 inch (1.28 minutes of 
arc peak to peak) increase in error was con- 
siderably lower than at a movement of the 
object of 0.053 inch (1.45 minutes of arc 
peak to peak). The increases in error were 
23% (not significant) and 53% (p< 5%), 
respectively. Peak to peak amplitude of head 
movement had to reach 0.085 inch (2.32 
minutes of arc) to show an increase in error 
of 57% similar to that found at 1.45 minutes 
of arc displacement of the object. This con- 
trast between the two sets of data is height- 
ened by the consideration that under whole- 
body vibration lateral and frontal components 
would also have been present in the head 
movements of the Ss. 
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These results confirm Guignard and Irv- 
ing’s (1961) finding that compensatory eye 
movements elicited by labyrinthine stimula- 
tion during whole-body vibration are more 
effective than pursuit movements of the oscil- 
lating target. However, their results show that 
at their highest frequency condition (3.4 
cycles per second) these compensatory eye 
movements are less than half the relative 
movement between man and object. Close 
comparisons between these data and the pres- 
ent results may mislead, for amplitude condi- 
tions differed greatly. Angular displacement 
of the S’s head was of the order of 1 degree 
of arc in Guignard and Irving’s results. In 
our experiments at 6 and 7 cycles per second 
the angular displacement of the head did not 
exceed 2.8 minutes of arc, but since decre- 
ments in visual performance were found, this 
is confirmation that although compensatory 
eye movements are more efficient than pursuit 
movements they are limited in extent and ef- 
fectiveness at these frequencies. 

The effect of whole-body vibration and of 
vibration of the visual object above 10 cycles 
per second. Above 10 cycles per second the 
effects of S vibration are more pronounced 
than those of vibration of the object. Vibra- 
tion of the object had no statistically signifi- 
cant effect at peak to peak amplitudes of 0.015 
inch (0.41 minute of arc) at 14 cycles per 
second, and 0.003 inch (0.08 minute of arc) 
at 27 cycles per second (conditions M and V 
of Experiment II), and, at the higher illumi- 
nation, 0.012 inch (0.33 minute of arc) at 19 
cycles per second (Experiment IV). In con- 
trast, whole-body vibration produced statisti- 
cally highly significant effects (pb < 0.1%) at 
peak to peak amplitudes at the head of 0.010 
inch (0.27 minute of arc) at 14 cycles per 
second, and 0.0012 inch (0.033 minute of 
arc) at 27 cycles per second (DL and FL of 
Experiment I) and, at the higher illumination, 
0.007 inch (0.20 minute of arc) at 19 cy- 
cles per second (Experiment IV). The role of 
lateral and frontal components of head move- 
ments in affecting acuity cannot be determined 
from these latter experiments. The movement 
of the head resulting from whole-body vibra- 
tion is evidently more complex than the verti- 
cal motion of the vibrated object, and this 
prevents any direct comparison between the 
two modes of vibration in this frequency 
range. However, certain considerations point 
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to the conclusion that displacement of the S’s 
head relative to the object, however complex 
the movement may be, is not in itself suffi- 
cient to account for the considerable losses in 
acuity found under whole-body vibration. 

On four Ss the mean vertical component of 
head movement at 14, 19, 27 cycles per second 
was found to be considerably larger than 
either the frontal or lateral components. 

Vertical amplitudes of head movement 
causing a statistically highly significant de- 
terioration in visual performance at these fre- 
quencies were considerably smaller than the 
corresponding displacements of the object 
which had no significant effect. (Actual head 
amplitude/object displacement ratios were ap- 
proximately 2:3 at 14 cycles per second, 3:5 
at 19 cycles per second, and 2:5 at 27 cycles 
per second.) 

These results lend credence to the reso- 
nance theory of the eyeball which Coermann 
(1939) and Crook (1950) thought necessary 
to explain the sensitivity of visual perform- 
ance to the effects of whole-body vibration. It 
is probable that the stability of the eye may 
also be affected by resonance of facial tissue. 
During the present experiments, vibration of 
the facial tissue was noticeable at frequencies 
of 14 cycles per second and above and was 
found to be most uncomfortable at large am- 
plitude levels. Obviously the possibility of res- 
onance factors affecting some special part of 
the eye, such as the lens, still remains. 
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FOLLOW-UP TECHNIQUES IN A LARGE-SCALE 
TEST VALIDATION STUDY 


ROBERT C. DROEGE ann ALBERT C. CRAMBERT 
United States Employment Service 


Records were kept on success of techniques used to obtain follow-up informa- 
tion on occupational and educational status of 12,615 individuals 2 yrs. after 
being tested in Grade 12. Various techniques were used, with varying degrees 
of success. The information was obtained for most individuals through use of 
letters, an effective and low-cost technique for obtaining factual information. 
It was found to be worthwhile to send as many as 3 letters, if necessary, before 
trying another technique. Among the most successful techniques, in terms of 
percentage of attempts that were successful, were telephone calls, personal visits, 
employment service records, and knowledge of individuals in the community. 


The feasibility of conducting large-scale 
ongitudinal test research often depends more 
on capability of obtaining than of processing 
che data. Recent studies involving follow-up 
of a large number of individuals (Dailey, 
1963; Thorndike & Hagen, 1959) point out 
the many problems associated with obtaining 
adequate data. One of these problems is ob- 
‘aining necessary factual information about 
‘he individual at the time of follow-up. 

Nineteen state employment services have 
completed collection of data for a study to 
letermine the validity of aptitude scores of 
nigh school students for predicting occupa- 
ional and college success 2 years after high 
school graduation (Droege, 1960). In connec- 
ion with obtaining criterion data for this 
study, detailed records were kept on the suc- 
cess of various techniques used in following 
ip individuals in the sample. 


METHOD 


The first group to be followed up consisted of 
958-1959 seniors (V = 6,800) in 168 high schools in 
9 states. The second group consisted of 1959-1960 
eniors (V = 7,140) in the same schools. In the spring 
£ their senior year, these students completed an 
‘address sheet” for use in locating the individuals 2 
rears later. This sheet provided for such information 
‘Ss present address and phone number, social security 
‘umber (arrangements were made for each student 
o be assigned one), and name, address and phone 
‘umber of parent, next closest relative, and another 
verson who would be likely to know the location of 
he individual 2 years later. 

In 1961, and again in 1962, state employment serv- 
ce personnel of the 19 states applied a number of 
echniques in attempting to locate the individuals 2 
‘ears after they had completed the address sheet. The 


objective of this phase of the follow-up was to locate 
the individual and identify his employer, college at- 
tended or other status 2 years after high school 
graduation. The United States Employment Service 
national office provided guidelines to the state em- 
ployment services with regard to follow-up proce- 
dure. These guidelines were general, and state em- 
ployment service personnel were encouraged to de- 
velop the specifics of the approach to be used and 
techniques to be applied in their own states. After the 
results of the 1961 follow-up had been analyzed, a 
summary was prepared by the national office and sent 
to personnel participating in the study for their 
guidance in conducting the 1962 follow-up. 


RESULTS 
1958-1959 Seniors 


The potential sample consisted of 6,800 
seniors who completed the address sheet in the 
spring of 1959. Of these, 5,904 (87%) were 
located 2 years later and information on em- 
ployer, college attended, or other status was 
obtained. The 19 states made a total of 13,127 
separate attempts to obtain the information. 
A variety of techniques was used. In almost 
all cases, the first attempt was a letter to the 
individual, using his address as shown on the 
address sheet, requesting him to complete an 
“information sheet” and return it in an ad- 
dressed and stamped envelope which was pro- 
vided. Of the 6,702 individuals to whom such 
a letter was written, 3,629 (54%) responded. 
Second and third letters to nonrespondents re- 
sulted in returns of 44% and 24%, respec- 
tively. Respondents to the first, second or 
third letter totaled 5,149 of the 5,904 indi- 
viduals for whom the information was even- 
tually successfully obtained. The information 


253 














254 Rosert C. DrRorcr AND ALBERT C. CRAMBERT 
TABLE 1 
DETAILED ANALYSIS OF EFFECTIVENESS OF TECHNIQUES 
Techniques used for individuals Number of states Total 
in potential sample using technique 
A. Letters using address shown on “‘address sheet” 


1. First letter 
2. Second letter 
3. Third letter 
4. Fourth letter 


. Letters using address obtained from others on 


address sheet 

1. First letter 
2. Second letter 
3. Third letter 


. Letters using address obtained from city or tele- 


phone directory 
1. First letter 
2. Second letter 


. Letters using address obtained from department of 


motor vehicles 
1. First letter 
2. Second letter 


. Letters using address obtained from high school 


attended 

1. First letter 
2. Second letter 
3. Third letter 
4. Fourth letter 


. Letters to individuals in sample using address ob- 


tained from other sources 


. Personal visits 


. State agency ‘“‘wage record”’ file 


Telephone calls 


Local office records and employee knowledge 


. Information obtained from individuals in 


community 


. Other techniques 


High schoo] counselor 
Various 
Letter to parent 


Totals 


19 
16 
12 

1 


+= 00 


Now 


BRP 


11 


11 


number successful 


7,125 
2,993 
981 
103 


498 
200 
42 


86 
13 


35 
14 
10 


151 
79 
476 


110 


39 


23 
zi 
134 


13,140 


Attempts to obtain 
the information 


Number 


3,863 
1,423 
369 
32 


213 
45 


= 


_ 
ON FO 


125 
33 
373 


60 


15 


18 


66 


6,711 


Percentage 


54.2 
47.5 
37.6 
31.7 


42.8 
Does 
16.7 


39.5 
» 00.0 


66.7 
50.0 


45.7 
28.6 
20.0 
00.0 


41.8 
78.4 


$4.5 


38.5 


78.3 
100.0 
49.3 





FoLLtow-up TECHNIQUES 


was obtained for the remaining 755 individuals 
through use of other, generally more expen- 
sive, techniques. Among the most successful of 
these, in terms of the percentage of attempts 
that were successful, were personal. visits 
(81% successful), State agency “wage rec- 
ord” files* (60% successful), employment 
service records and employee knowledge (95% 
successful), and knowledge of individuals in 
the community (94% successful). 


1959-1960 Seniors 


The potential sample consisted of 7,140 
seniors who completed the address sheet in 
the spring of 1960. Of these, 6,711 (94%) 
were located 2 years later and information on 
employer, college attended, or other status was 
obtained. Of the 19 states participating in the 
study, 16 obtained the information for a 
larger proportion of their second sample than 
of their first sample. Nine states obtained the 
necessary information for over 99% of their 
potential sample of 1959-1960 seniors. 

The improved results for the 1959-1960 
seniors may be attributed to an increase in the 
average number of techniques used per state, 
from about 6 to about 7, and a shift in em- 
phasis toward the use of more effective tech- 
niques. Only 45% of the attempts made to 
obtain the information for the 1958-1959 
seniors were successful, but this percentage 
increased to 51% for the 1959-1960 seniors. 
There was a strong relationship between the 
number of techniques used by an individual 
state to obtain the information for 1959-1960 
seniors and the state’s overall success. For 
example, the 9 states that obtained informa- 
tion for over 99% of their potential samples 
used an average of almost 9 techniques, com- 
pared to an average of only about 5 tech- 
niques used by the 10 states that obtained in- 
formation for less than 99% of their potential 
samples. Table 1 shows a detailed analysis of 
the effectiveness of the various techniques 
used for the 1959-1960 seniors. Table 2 
shows an overall analysis of the successful 
attempts. 


1The “wage record” file, maintained in most 
states, contains the names of individuals in employ- 
ment covered by the state unemployment insurance 
aw for whom unemployment insurance taxes were 
yaid by their employers. 


TABLE 2 


OVERALL ANALYSIS OF SUCCESSFUL ATTEMPTS 





Information successfully obtained 








Percentage 
Technique used Number of total 
A 5,687 84.7 
B 265 3.9 
G 34 0.5 
D 9 0.1 
EB 22 0.3 
F 2 0.0 
G 15 1.9 
H oe 0.5 
I 373 5.6 
y 60 0.9 
K 15 0.2 
L 86 133 
Totals 6,711 99.9 
Employers, colleges attended, or other 


status were identified for most of the poten- 
tial sample of 1959-1960 seniors through let- 
ters to the individuals using the address shown 
on the address sheet. Most states used three 
letters, and obtained a good response from 
each of them. One state sent a fourth letter to 
the individuals who had not responded to the 
first three letters and was successful in more 
than 30% of its attempts with the fourth let- 
ters 

Three states obtained good results using the 
addresses of parents of individuals in the sam- 
ple who had not responded to letters sent to 
them at the address shown on the address 
sheet. One of these states sent letters to the 
individuals in care of their parents. A second 
state sent letters to parents requesting their 
assistance in having the individuals in the 
sample complete the information sheet. The 
third state’s letter asked the parents for the 
information directly. Each of these approaches 
yielded the information for a good proportion 
of the individuals for whom it was used. 

First letters to the individuals using ad- 
dresses obtained from other sources—from 
others on the address sheet, from city or tele- 
phone directories, from departments of motor 
vehicles, from high schools attended, or from 
other sources—also were effective for the 
states that used them. Additional letters to 
individuals using addresses obtained from 
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these sources, however, were relatively inef- 
fective when they were used. 

Personal visits were used much more ex- 
tensively for the 1959-1960 seniors than for 
the previous group. Visits were made by 10 
state agencies, and were successful in obtain- 
ing the information for 83% of the individuals 
for whom attempts were made, making per- 
sonal visits the most successful of all the major 
techniques used. Of these 10 states, 5 obtained 
information for all of the individuals visited. 
Telephone calls to individuals in the sample, 
to persons listed on the address sheet, or to 
other individuals in the community, were also 
used very effectively by more states for the 
1959-1960 seniors than for the preceding 
sample. 

A number of states had good results with 
techniques for obtaining the information from 
various individuals in local communities. Five 
states utilized local office records and employee 
knowledge with considerable success. Two 
states were successful in obtaining the infor- 
mation for 100% of the individuals for whom 
it was sought from individuals in the com- 
munity. One state suggested the post office 
and neighbors as possible sources of the in- 
formation; another had good results in ob- 
taining the information for some individuals 
in its sample by contacting counselors in the 
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high schools which the individuals had at- 
tended. 


CONCLUSIONS 


In this follow-up study, an intermediate ob- 
jective was to locate individuals 2 years out 
of high school and obtain certain factual in- 
formation about their educational or occupa- 
tional status. Various techniques were used, 
with varying degrees of success, to obtain the 
information. The following generalizations 
would appear to be applicable to similar situ- 
ations: (a) Primary reliance should be placed 
on obtaining the information by means of let- 
ters, because of their general effectiveness and 
low cost. (b) In most cases, it will be worth- 
while to send as many as three letters, if 
needed, before trying another technique. (c) 
A variety of techniques should be used for 
maximum effectiveness. 
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A SET OF CONDITIONS FOR A CONSISTENT 
RECOVERY OF A SUBLIMINAL STIMULUS 


GERALD M. MURCH 


Universitaet Goettingen, Goettingen, West Germany 


3 experimental groups of 10 Ss, each with their corresponding controls, were 
given mathematical problems in a tachistoscope as a supraliminal stimulus. At 
a level established by a pretest group, the experimental groups received sub- 
liminal answers to the problems. Group 1 attempted to solve the problems, 
Group 2 to guess at the answers, and Group 3 to select their answers from dual 
possibilities on a given list. A significant tendency was found in Groups 1 and 
2 to repeat various subliminally projected digits in their answers, without the 
answers directly affecting their computational processes. Group 3 selected the 
projected answers significantly over the correct answers. The need for a positive 
relationship between supra- and subliminal stimuli as well as the relevancy of 
the task to the Ss’ present activity was also observed. 


The question as to the existence of sub- 
liminal perception has held the attention of 
psychology and other related fields for many 
years; however, the number of successful 
demonstrations of a positive effect, due to a 
subliminal stimulus on the behavior of sub- 
jects (Ss), are relatively few.’ It is the pur- 
pose of this experiment to test the possibili- 
ties for the recovery of a subliminally pre- 
sented stimulus, as well as to aid in the 
understanding of conditions under which ef- 
fects of subliminal stimulation can be found 
with consistency. 

Increased sales in popcorn and Coca-Cola 
have been reported, following presentation of 
subliminal stimuli in the form of slides carry- 
ing the messages, “Buy Popcorn” or ‘Buy 
Coca-Cola” which were projected below the 
10rmal perceptual threshold during the show- 
ing of a movie in a public theater. Whether 
or not this experiment was actually conducted 
or was properly controlled, it is clear that the 
nfluence of these claims upon later experi- 
nental designs has not been the furthering 
of an understanding of the conditions for 
oroducing effects with a subliminal stimulus.” 

Tf subliminal perception exists, it can influ- 


1A general review of previous experiments may 
xe found under the references: Klein and Holt, 1961; 
Nalyor and Lawsche, 1958. 

2 One must differentiate between experiments using 
| supraliminal (masking) stimulus (subliminal per- 
‘eption experiments), and those experiments using 
mly a subliminal stimulus (subception or discrimina- 
ion without awareness). 
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ence the S in one of two possible ways. The S 
may be motivated to react as demanded by 
the stimulus, as in the alleged Coca-Cola 
experiment, or he may show capability of 
repeating the stimulus in its given form, with 
no other effect upon his behavior or thinking 
process. Although the first possibility has 
been repeatedly tested, the second, and more 
obvious, has received little attention. Another 
neglected point is the relationship of the 
subliminal stimulus to the supraliminal stimu- 
lus. To be effective the subliminal stimulus 
ought to be positively related to the supra- 
liminal stimulus. 


MetTHop 


The Ss were 70 first and second semester psychol- 
ogy and sociology students at the University of 
Goettingen. Ten of the Ss were used in a pretest 
to establish the perceptual threshold for arabic 
numerals presented from above threshold to below 
threshold, and from below threshold to above thresh- 
old in a tachistoscope (Bettendorf two field mirror). 
Each S received the numerals beginning with an 
exposure speed of 1/1000 of a second, increasing 
in intervals of 1/1000 of a second, until the number 
was perceived. The exposure speed of 2.5/1000 of 
a second was selected as below the perceptual thresh- 
old, with a relatively large individual variation. 
An exposure speed of 5/1000 of a second was selected 
as below the perceptual limen for the subliminal 
stimulus, which with the presence of a masking 
(supraliminal) stimulus was below verbal recognition 
levels. 

The same pretest Ss were also shown tachistoscopi- 
cally a series of simple mathematical problems, 
which were composed of 2, 3, and 4 digits; for 
example: (23+18—14= ), (456 = 317-+--129 
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TABLE 1 


Tt VALUES FOR THE MEAN ARITHMETIC DIFFERENCES 
OF THE SUBJECTS’ GIVEN RESPONSES SUBTRACTED 
FROM THE SUBLIMINALLY PROJECTED ANSWERS 











Absolute Level of 
Group mean t value confidence 
1 144.35 
la 267.02 Desi -— 
2 182.20 negative reaction == 
2a 159.10 
1 144.35 negative reaction — 
2 182.20 





=  ), (6407+5989—7342= _). The Ss were 
instructed to solve the problems as rapidly as pos- 
sible. All Ss used more than 5, 10, and 20 seconds, 
respectively, for the problems in order of their digital 
components. 

The remaining 60 Ss were matched as to age and 
sex in three major groups, which were further sub- 
divided into corresponding experimental and control 
groups. The Ss of each group were informed that 
they were taking part in an experiment to determine 
the computational difficulties presented by particular 
digits in the solution of mathematical problems. 
Group 1 Ss were further told that they would be 
given 15 problems to solve, with varying time limits 
for solution, depending upon the digital difficulty of 
the problem. The Ss were also told not to answer in 
“round numbers,” and that if they did not have 
time to complete the problem, which was purposely 
the case, they should guess at the answer on the 
basis of their completed work. 

Each problem was shown in the tachistoscope for 
5, 10, and 20 seconds, respectively, and simultane- 
ously displayed an answer which was projected at 
the below-threshold level established by the pretest 
group. The answers were divided into four types, 
differing in their relationship to the correct answer. 


Type 1: near the correct answer 

Type 2: remote from the correct answer 

Type 3: remote, but with same last digit as cor- 
rect answer 

Type 4: same as correct answer 


Each S was tested individually. Each was seated 
in front of the tachistoscope and instructed to look 
into the viewer during the presentation of all prob- 
lems. The cards containing the problems and their 
respective answers were changed simultaneously in 
order to prevent any aural indication of the presen- 
tation of more than one stimulus. 

The control Group 1a was treated the same as 
the experimental Group 1, but did not receive sub- 
liminally projected answers. 

The second experimental group was instructed to 
read the problem twice through orally, and then 
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to guess at the answer, but not in round numbers. 
Control Group 2a was treated in the same manner, 
but did not receive the subliminally projected 
answers. 

The third group read the problems only once, and 
then selected their answers from among a list of 
the subliminally projected answers and correct an- 
swers, for each problem, given on an answer list. 
Control Group 3a was treated the same, omitting 
the subliminal stimulation. 

Each S$ was asked, after the experiment, if he or 
she had seen anything outside of the problem in the 
apparatus. No S answered the question positively. 


RESULTS 


To test the possibility that the subliminally 
projected answers would influence the mental 
process or affect the behavior of the S, each 
answer given by the Ss was subtracted from 
the projected answer to the problem. The 
absolute mean for all 10 Ss of each group was 
then computed on each problem for all 15 
problems, giving the total mean deviation as 
a result. 

The first group had the task of attempting 
to solve the problems. This meant that con- 
centration was an important factor, as com- 
pared to Group 2, who only guessed at the 
answers. The question, between the two 
groups, is one of a high or low degree of 
concentration, and the possible effect of this 
concentration on the reactions of the Ss. 

The absolute means for the two experi- 
mental and control groups were compared in 
a t test for matched samples. The results of 
this comparison as well as the comparison of 
Groups 1 and 2, are shown in Table 1. 

The results of this analysis show that the 
projected answers were not capable of influ- 
encing the Ss into giving a more exact answer, 
in relation to the projected answers them- 
selves. The first hypothesis was not supported 
by this analysis. 

The second possibility was that the Ss were 
merely repeating the stimulus in its given 
form without being influenced in their com- 
putational methods by the stimulus. 

To test this hypothesis, the sum of the 
digits found in the Ss’ answers, which were 
the same as those given in the projected an- 
swers, was computed for each problem. A 
value of one was given to each digit which 
appeared in the same position in the S’s an- 
swer as in the projected answer. A ¢ test was 
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conducted to determine if the apparent dif- 
ferences between groups were significant. The 
mean number of given, projected digits was 
computed on each problem for the 10 Ss of 
each group. The results appear in Table 2. 

Table 2 also shows the test between ex- 
perimental Groups 1 and 2, whose negative 
result indicates, as by the first hypothesis, 
that a high degree of concentration was not 
a determining factor in the response of the 
Ss to the subliminal stimulation. 

It is clear that subliminal perception does 
exist, when the S is not demanded to make 
use of the subliminally projected data in any 
way other than in its given form. 

A further test was conducted to see if the 
relationship of the subliminally projected an- 
swers to the correct answers had influenced 
the responses of the Ss. A test of the first 
hypothesis, in relation to the type of projected 
answer, was not appropriate due to the in- 
significant result of the first ¢ test. It was to 
be expected that the relationship of the sub- 
liminal answers to the correct answers would 
not be important, if the Ss were only repeat- 
ing the elements of the projected answers in 
their responses. This assumption proved to be 
the case as can be seen in Table 3. The num- 
ber of given, projected digits for each type 
of answer was computed for each S in all 
four groups. 

It was also assumed that if the Ss were 
merely repeating elements of the projected 
answers that the strongest demonstration of 
subliminal activation was to be found in a 


TABLE 2 


t VALUES FOR THE MEAN SUMMATION OF THE 
SUBLIMINALLY ProyecTteD Dicits APPEARING 
IN THE RESPONSES OF THE SUBJECTS 
IN EACH GROUP 





Level of 
Group M t value confidence 
1 12.1 
la 8.3 3.24 0.01 
2 13.5 
2a 6.9 6.22 0.01 
1 124 
2 13.5 1.19 — 
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TABLE 3 


tT VALUES FOR THE SUBLIMINALLY PROJECTED Dicits 
REPEATED IN THE SUBJECTS’ RESPONSES DIVIDED 
AS TO THE RELATIONSHIP BETWEEN PROJECTED 
ANSWERS AND CoRRECT ANSWERS 
FOR ALL PROBLEMS 








Type of 
projected Level of 
answer Group M t value confidence 
1 6.9 
1 la 5.1 1.95 — 
(near) 
D 6.4 
2a She) 5.08 0.01 
1 a 
2 la 1.6 3.95 0.01 
(remote) 
2 5.0 
2a eS ed) 0.01 
1 4.3 
3 la 2.8 1.82 — 
(last digit) 
2 3.3 
2a Baik 1.96 —_ 
a2 
4 la 2.9 3.28 0.01 
(same) 
2 maf 
2a Oeil 2.05 _— 


selection situation. The third group had the 
task of reading the given problems once, and 
then to select either the projected or the 
correct answers from a given list containing 
only these two answers. 

Table 4 shows the number of selected an- 
swers which had been subliminally projected 


TABLE 4 


COMPARISON OF THE NUMBER OF PROJECTED 
ANSWERS SELECTED BY THE EXPERIMENTAL 
AND CONTROL GROUPS 








Group N Mt value p 
Group 3 
(with subliminal 
projection) 10 Se 207 
Las <0.01 
Group 3a 


(without subliminal 
projection) 10 5.8 
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at the established level in the tachistoscope. 
The total number of selected, projected an- 
swers for each S in the experimental and 
control Groups (3 and 3a) were summated 
on all 15 problems for each S. 

The obvious difference was found to be 
significant at the 0.01 level. The ¢ test gave 
a value of 11.13. The hypothesis stating 
that the Ss merely mirrored the given sub- 
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liminal stimuli was accepted on grounds of 
this analysis. 
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EFFECT OF BRAND PREFERENCE UPON CONSUMERS’ 
PERCEIVED TASTE OF TURKEY MEAT 


JAMES C. MAKENS 


Michigan Technological University 


2 experiments are reported concerning a wider possible application for taste 
tests in brand research. In 1 experiment a panel of 150 compared the taste and 
texture of 2 identical samples of turkey meat. In the other, 61 Ss compared the 
taste and texture of unlike samples of turkey meat. In both experiments Ss 
were asked to match their comparisons with 2 related commercial brands. 
Results indicated that a well-known brand positively affected the taste which 
Ss experienced for samples of turkey meat. 


Several studies have been conducted to de- 
termine the ability of panelists to correctly 
dentify selected branded products through 
laste experiments (Bowles & Pronko, 1948; 
Printers Ink, 1962; Pronko & Bowles, 1948, 
1949; Pronko & Herman, 1950; Prothro, 
1953; Thumin, 1962). The products which 
were included in these tests were ones that 
re normally purchased by consumers quite 
requently such as cola beverages or beer. 
[These were also products whose brands are 
1ormally before the consumer during time of 
sonsumption and which may be consumed in- 
lividually. 

Taste tests are useful tools in marketing re- 
‘earch but not enough diverse experiments 
ave been reported to demonstrate their range 
f applicability. The study to be reported 
ffers an example of the wide application 
hat taste tests have as useful research 
echniques. 

The purpose of this study was to determine 
he effect of a well-known brand turkey on 
‘onsumers’ preferences based upon a measure- 
nent of their taste ratings for turkey meat. 

This study is a departure from previous 
eported taste tests due to the characteristics 
ff the product. A turkey is not purchased 
requently nor is the brand normally dis- 
layed before the consumer when the turkey 
s eaten. It is also a product that is not 
eady for consumption at the time of pur- 
hase. Unlike cola beverages or beer, the 
rand of turkeys is not normally before con- 
umers at the time of consumption and might 
ie known only to the purchaser and/or cook. 
“herefore, unless those eating the turkey were 
old that the meat they were being served 
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is a certain brand, it is improbable that their 
sense of taste would be influenced. 


METHOD 


The Michigan State University consumer prefer- 
ence panel of 150 Detroit area consumers was utilized 
as the test group. This panel had been in existence 
since 1956 and was held three times each year at 
Wayne State University in Detroit. The members 
had incomes ranging from $4,000 to $10,000; had 
received 12-13 years of formal education; and were 
between 31-45 years of age. The panel operated as 
an afternoon and an evening session with different 
individuals participating in each. 

The study consisted of two different experiments. 
Both were included in the afternoon meeting but 
during the evening, only Experiment I was used. One 
hundred and fifty consumers participated in Experi- 
ment I and 61 participated in Experiment II. 

Two different brands were used throughout the 
experiment. One brand had never been sold in the 
Detroit area and was unknown by the subjects (Ss). 
The other (known brand) represented a nationally 
advertised brand which was sold in the Detroit area 
at the time of the study. The Ss had previously 
demonstrated a majority preference for this brand 
over others in a ranking test held during an earlier 
panel. This test did not involve taste. Instead it was 
a four sample ranking test involving turkeys of 
different brands. This gave evidence that the brand 
designated as a known brand was indeed known and 
preferred by Ss. This knowledge is of particular im- 
portance in a test of this nature. The results of a 
questionnaire also demonstrated that the known 
brand had been purchased by many of the Ss within 
the previous year. 

The unknown brand had not been used in previous 
experiments and was for all practical purposes 
completely unfamiliar to the Ss. 


Experiment I 


Procedure. A turkey was roasted prior to the 
experiment and similar size samples were cut from 
one section of the breast. These samples were evenly 
divided on two ceramic plates. A cardboard carton 
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was placed behind each plate. One carton was 
covered with a plastic bag bearing the known brand 
while the other carton was covered with a bag 
bearing the unknown one. 

The Ss were given a sample from each of the 
plates and were told that these samples were taken 
from turkeys of the brands represented by the bags 
behind the plates. They were then asked to compare 
the taste and texture of the two samples and rank 
them on a card. A five-point hedonic scale, including 
the words Excellent, Good, Fair, Poor, and Bad, 
was listed for each sample. After the samples were 
ranked, the Ss were asked to indicate which of the 
two samples they preferred or if both tasted the 
same. The order in which the samples were tasted 
was reversed during the evening session. 

A null hypothesis was established that there is 
no difference in preference for the two samples. 


Experiment II 


Procedure. The second experiment employed simi- 
lar size samples with unlike texture (tough and 
tender). These samples had been cut from a like 
section of the breasts of two turkeys. The tougher 
samples registered a shear value of 12 as compared 
to a value of 6 for the tender samples as recorded 
on a Warner-Bratzler Shear Press. 

The Ss were not told that the textures of the 
samples varied. Instead, the samples were placed 
on two ceramic plates and were identified only by 
typewritten symbols (% and *) which according to 
Marquardt, 1963, have no preference meanings to 
consumers. The samples identified as * had the 
value of 12. 

Each S was asked to indicate on a card, from 
which of the two brands displayed in Experiment I, 
he believed each sample was taken. If a S had no 
idea from which branded turkey a sample had been 
cut, he was instructed to check an appropriate blank 
indicating this. Thus the Ss were given three choices. 
They were also asked to indicate which of the 
samples they preferred. 

A null hypothesis was established for Experiment 
II that there is no difference in preferences between 
sample % and sample *. 

The second taste test immediately followed the 
one involving the hedonic scale. It is therefore pos- 
sible that the sequence of testing may have affected 
the results to a certain degree. 


RESULTS 
Experiment I 


It was necessary to assign a value to each 
of the adjectives before any statistical compu- 
tations could be applied. The word Excellent 
was given a value of 5, Good was assigned 
a value of 4, Fair 3, Poor 2, and Bad was 
given a rating of 1. The statistical method 
employed was the Wilcoxon matched-pairs 
signed-rank test. 
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TABLE 1 


PREFERENCES DEMONSTRATED FOR TURKEY 
Meat SAMPLES BY PANEL MEMBERS 





Subject N 

Subjects who preferred sample identified as taken 
from known brand turkey 84 

Subjects who preferred sample identified as taken 
from unknown brand turkey Si 
Subjects who indicated no preference 15 
Total 150 
The results were analyzed statistically 


using an SD of 339.88 and an W of 111 
which equaled the sum of the d’s. The un- 
known brand had a mean score of 19.21 and 
a T value of 2,133 as compared to a mean 
score of 37.68 and a T value of 4,183 for 
the known brand. A value for z of 3.1 was 
yielded which indicated that the results were 
significant at the .01 level. Thus, the results 
of the hedonic scale indicated that Ss were 
influenced by the known brand even though 
both samples of meat were identical. 

This was further strengthened by the re- 
sults of the section of the test in which the 
Ss gave an answer reflecting their ordinal 
utility; i.e., they were asked to merely state 
a preference for one over the other. Here 
again the known brand was preferred as 
shown in Table 1. Out of a total of 150 
replies which were given during the afternoon 
and evening panels, only 15 Ss (10%) indi- 
cated that both the samples tasted alike. 
Fifty-one (34%) of the Ss indicated that 
they preferred the unknown brand samples, 
and 84 (56%) indicated they preferred the 
known brand ones. 


Experiment II 


The data in Table 2 showed that the 
sample identified as % (tender sample) was 
preferred by 49 of the 61 evening panel 
members. This constitutes a majority prefer- 
ence of 80% as compared to 7% (4 Ss) 
who stated that they. preferred the sample 
marked *. The remaining 8 Ss gave no reply. 
The results of a chi-square analysis yielded 
a value of 58.9 which was significant at the 
01 alpha level. It is obvious that the Ss 
were able to detect a quality difference be- 
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TABLE 2 


PREFERENCES FOR TOUGH AND TENDER TURKEY MEAT By PANEL MEMBERS AND THEIR ASSOCIATION 
WITH KNOWN AND UNKNOWN BRANDS 


OOOO SSS 











Sample Sample 
% * Neither Total 
Number of persons who preferred the sample 49 4 8 61 
Number of persons who said the sample was from the known brand turkey 31 3 aaa 34 
Number of persons who said the sample was from the unknown brand turkey 17 1 a 18 
Number who did not indicate which brand the sample was from 1 — — 1 


tween the two samples and the null hypothesis 
was therefore rejected. 

The results of this experiment also showed 
that 31 Ss (63% of the 49 Ss who preferred 
sample %) stated they believed it came from 
the known brand. A total of 17 Ss stated it 
came from an unknown brand and one S did 
not indicate its source. 

Only four Ss stated a preference for the 
sample marked * and three of these stated 
they believed this sample came from the 
known brand. The remaining S believed it 
came from the unknown brand. 

As a total, 34 Ss stated that the samples 
they preferred were taken from the known 
brand turkey, 18 indicated they were from 
the unknown brand and one S gave no indica- 
tion. A chi-square analysis was performed 
using an expected value of 16.3 since the 49 
panelists were given three equal choices. The 
observed values were 34, 18, and 1. The 
analysis yielded a value of 30.0 which was 
highly significant. 


CONCLUSION 


The use of taste tests in brand research 
appears to have a wider application than 
is indicated from the number of published 
studies. This study shows that taste tests may 
be used for as diverse a product as turkey. 

Brand preference for one well-known brand 
turkey was strong enough to influence the 
perceived taste for turkey meat among the Ss. 
It is apparent that the Ss were influenced by 
the brand since a significant preference was 
shown for turkey samples identified as taken 
from a known brand versus identical ones 
labeled as taken from an unknown brand 
turkey. 


Panelists were also able to detect taste dif- 
ferences between turkey meat samples with a 
shear value of 12 as compared to those with 
a value of 6 and preferred the latter. A sig- 
nificant number of Ss also stated that the 
samples they preferred were taken from a 
known brand turkey. This indicates that con- 
sumers expect a well-known brand turkey 
to be of superior quality to an unknown 
brand. It may also indicate that advertising 
for this particular brand has been effective. 
Obviously different brands of turkeys are 
differentiated products in the minds of con- 
sumers. This evidence should be of interest 
to all marketers of relatively homogenous 
food products as they plan an advertising 
and brand building program. 
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A STUDY OF ITEM WEIGHTS AND SCALE 
LENGTHS FOR THE SVIB 


ALLAN N. NASH 1+ 
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7 SVIB scales were developed and cross validated on 461 managers from 13 
varied Minnesota companies. Questions studied were (a) Which item weighting 
method results in the highest scale validity? (b) Are shorter scales as valid as 
longer scales? (c) How much may scales be shortened? (d) Why may they be 
shortened? Controls for scale length, content, validity, and for item weighting 
method were introduced. Results indicated (a) there was no practical difference 
in validities between simple unit versus variably weighted scales, (b) shorter 
scales were as valid as longer scales, (c) Clark’s “40 to 60 item optimum scale 
length” hypothesis was supported, (d) although not conclusive, shorter scales 
appeared superior partly because their average item validities were greater 
and thus they perhaps should not be used where developmental item pools are 


rich in valid items. 


A review of both the early and more recent 
literature indicates a long standing disagree- 
ment about how interest items should be 
weighted (Clark, 1961, p. 30; Kuder, 1963, 
p. 8; Strong, Jr., 1943, p. 632). There is gen- 
eral agreement that weights should vary with 
the validity of the item, but there is no con- 
sensus concerning how much variation in 
weights is desirable, or whether there is a 
generalizable optimum. The alternatives pro- 
posed and tried range from the simple unit 
weighting method (weights of +1) to a sys- 
tem recently proposed by Kuder which uses 
actual differences in proportions of responses 
of groups to be differentiated as the weights 
to be assigned. Thus, Kuder’s system theo- 
retically includes weights of from +100 to 
—100, although such extremes would be un- 
likely in actual application. 

The number of items to include in interest 
scales is also questionable. The Strong Voca- 
tional Interest Blank (SVIB) presently in- 
cludes scales with more than 200 items which 
results in substantial difficulty in scoring re- 
sponses and in developing new scales. Clark 
sought to determine if this number of items 


1 Appreciation is expressed to the author’s co- 
advisors, Herbert G. Heneman and Marvin D. Dun- 
nette, for their guidance in the completion of the 
author’s doctoral dissertation, from which this ar- 
ticle is substantially drawn, and to Thomas A. Ma- 
honey for supplying the data. A grant from the 
General Research Board of the University of Mary- 
land permitted the continuation of related research 
upon which part of this article is based. 
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was really necessary. He interprets his evi- 
dence as suggesting convincingly that shorter 
scales are as effective, if not more so. Con- 
sidering both validity and reliability, he sug- 
gests that scales with from 40 to 60 items 
were best for his data. An intensive study by 
Strong, Campbell, Berdie, and Clark (1964) 
of several SVIB scales is also interpreted 
as supporting this shorter scale proposition. 
Darlington’s doctoral dissertation (1963) 
touches on this same question and he inter- 
prets his evidence as indicating that an itera- 
tive scale with 43 items was most valid. How- 
ever, this finding is of questionable compara- 
bility with other results since the items in 
this scale were chosen on the basis of both 
their validity and interitem relationships. 
Using the same item pool, an iterative scale 
with 103 items which used unit weights and 
ignored interitem relationships was found to 
be best. 

Although the preponderance of the fore- 
going evidence suggests that interest scales 
may be shorter than existing SVIB scales 
without sacrificing validity, there is doubt as 
to why they may be shortened. Strong et al. 
(1964) state 


It is not clear whether this [shorter scales being 
better] is because there is some optimal upper 
limit on the number of items that keys should con- 
tain—Clark in an earlier report has taken this 
stand—or whether there are usually only about this 
number of good items in a given occupational area. 


If the latter of these two possibilities was the 

_ reason, this would mean that scales probably 
should be longer when developmental item 
pools are rich in valid items. 

The purpose of this study was to further 
investigate these issues concerning the proper 
item weighting method and scale length for 
the SVIB. The specific questions posed were: 
(a) Which of three item weighting methods 
is most valid when SVIB scales developed 
with these methods are cross validated on a 
large group of business managers? (2) May 
SVIB scales be shortened from their existing 
lengths without sacrificing validity? (c) If 
they may be shortened, how extensively may 
they be shortened? (d) Why may they be 
shortened? 


MrtTHOD 


Responses to the SVIB of 468 business managers 
from 13 varied Minnesota based companies were 
obtained during 1956-1958 by research personnel of 
the Management Development Laboratory, Indus- 
trial Relations Center, University of Minnesota. 
Alternation rankings of overall managerial effective- 
ness, by from 1 to 6 executive superiors of the 
participating managers, were obtained simultaneously 
with the SVIB data. However, only 461 managers 
had properly filled out the interest blanks and had 
available sufficient criterion information (alternation 
rankings) to permit their inclusion in this study. 


The rankings for each manager were converted 
into percentile scores and averaged. The resultant 
composite percentile rank scores permitted the 
managers to be compared and classified along a 
criterion continuum on an intercompany basis. Corre- 
lations between the independent rankings varied from 
.08 through .95 with a median of .65. The 461 mana- 
gers were divided into two approximately equal sized 
subgroups (Samples 1 and 2), closely matched on 
such factors as job type and level in the organiza- 
tional hierarchy, effectiveness rankings, organization, 
and industry type. Managers in each of the samples 
were further subdivided into three high-middle-low 
criterion groups. Responses of the Sample 1 high- 
and low-criterion managers to each of the 400 SVIB 
items were analyzed to determine if the responses 
were related to the criterion of effectiveness. This 
analysis involved the computation of a 2 X 3 chi- 
square value (2 criterion groups, 3 SVIB item re- 
sponse categories) for each item and the proportion 
of upper- and lower-criterion groups responding in 
each item response category. 

All items were discarded which did not have a chi- 
square probability level of less than .50 and at least 
one item response difference between criterion groups 
of 10% or greater. An exception to the 10% was 
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made when one of the response percentages fell 
between 0 and 8% and 92 and 100%. This excep- 
tion was made to allow for the constriction of 
variance at the extreme ends of the distributions. 
From the original pool of 400 items, 230 were 
excluded, leaving 170 items for the development of 
scales from Sample 1 data. Seven scales were 
developed from this pool of 170 items. 


Comparison of Weighting Methods 


Three of the scales developed were selected to 
determine which of three alternative weighting meth- 
ods was superior for scales of this type. These three 
scales are identified below: 


Scale A: a 170-item scale which used the SVIB’s 
existing variable weighting method for all re- 
sponse category percentage differences between 
criterion groups of 10% (8% for extremes) and 
greater. Plus weights were assigned response 
categories if the high-criterion group had the 
larger percentage of responses in that category, 
minus weights were assigned if the low-criterion 
group had the larger percentage of responses. 
Weights were assigned as follows: 10% (8% for 
extremes) through 17% response differences re- 
ceived a weight of + or —1, 18% through 26% 
differences received a weight of + or —2, 27% 
through 33% differences received a weight of 
+ or —3, 34% and greater differences received 
a weight of + or —4. 


Scale B: a 27-item scale which applied unit weights 
of + or —1 to response categories with per- 
centage differences of 18% and greater (all items 
in Scale A weighted + or —2 and greater). 


Scale C: a 170-item scale which applied unit 
weights of + or —1 to response categories with 
percentage differences of 10% (8% for extremes) 
and greater (all items in Scale A). 


Scale A approximated the present multiple- 
weighting method used with the SVIB, the only dif- 
ference being that percentage differences of 6% to 
10% (8% for extremes) were not weighted in this 
scale. This was done to compensate for the smaller 
size of the development sample (Sample 1) in this 
study compared to scale development samples typically 
used for SVIB occupational scales and correspond- 
ingly larger standard error of the proportions in the 
present study. 


Scale B was chosen because its unit weighting 
method proved to be the best unit weighting method 
used in the study by Strong et al. (1964). However, 
this finding was tempered by the fact that scales 
developed with this weighting method may not have 
been cross validated in their study and the number 
of items included in such scales more closely approxi- 
mated Clark’s hypothesized 40 to 60 optimum range 
than did other scales weighted with different meth- 
ods. Its inclusion in this study subjects it to cross- 
validation and may provide insight into whether 
it was the best unit weighting method in the study 
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by Strong et al. because it contained more closely 
the optimum number of items to include in scales 
of this type, or whether there was some other reason 
for its apparent superiority. Since this scale only 
had 27 items in the present study, it might be 
expected. to fall in validity if the reason for its 
superiority was the 40 to 60 optimum item range 
phenomenon observed by Clark. Comparison of it 
with scales A and C provided evidence of its valid- 
ity relative to these scales developed with more 
commonly used weighting methods. 

Scale C was included because it provided additional 
independent evidence on the unit versus multiple- 
weight issue under controlled conditions. The number 
of items and response categories weighted were pre- 
cisely the same for Scales A and C. 


Comparison of Scale Lengths 


Four additional scales were selected, along with 
Scale C, to determine whether shorter scales would 
be as valid as longer ones, how much shorter such 
scales could be without sacrificing validity, and why 
they might be shortened without adverse effects. The 
evidence presented by Clark and Strong, et al. sup- 
ports the hypothesis that less than 60 items can 
be used without sacrificing reliability or validity in 
most cases. However, the exceptions to this con- 
clusion in the Strong et al. study (two of their 
eight most valid scales had over 150 items), and 
the fact that in both of these studies the shorter 
scales apparently had the most valid items, suggested 
the need for further study. If, for example, the 
shorter scales were as good as, or superior to, the 
longer scales because they contained only those items 
with the greatest percentage differences in responses, 
such a finding could not be extended to other groups 
where more than 60 good items existed in the 
item pool. 

The four scales developed which weigh on the 
above issues are described below: 


Scale D: a 57-item unit weighted scale with items 
selected from a stratified pool of the original 
170 items by entering a table of random num- 
bers to determine randomly which of the first 
3 items in the pool to start with, and systemati- 
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cally selecting every third item thereafter. The 
first item of the 3 was selected as the beginning 
item. 

Scale E: a 57-item unit weighted scale with items 
selected from a stratified pool of the original 
170 items, beginning with the second item of 
the first 3 in the pool and systematically select- 
ing every third item thereafter. 

Scale F: a 56-item unit weighted scale with items 
selected from the same stratified pool of 170 
items, starting with the third item of the first 
3 in the pool and systematically selecting every 
third item thereafter. 

Scale G: a 57-item unit weighted scale with items 
selected from the pool of 170 items on the basis 
of Sample 1 chi-square values and percentage 
differences in responses between high- and low- 
criterion groups to obtain the most valid items 
in the pool. 


These scales reflect an attempt to control weight- 
ing method and average item validity. The weighting 
method was held constant by applying unit weights 
to the response categories of items included in these 
additional scales. Results obtained in the comparison 
of Scales A, B, and C revealed little difference in 
validity between the variable and unit weighting 
methods, and thus the simpler unit method was 
adopted for this phase of the study. Accomplishment 
of at least a partial control on the validity of items 
in these additional scales involved the stratification 
of the items in the 170-item pool into 15 categories 
on the basis of chi-square probability levels. The 
order of items within each category was strictly a 
function of the sequence of their occurrence in the 
SVIB. 

If the cross validity of Scale G was higher than 
that obtained with the other three scales of the same 
length (Scales D, E, and F), it might be inferred 
that the favorable difference in the average item 
validity of this scale accounted for at least part of 
its superiority over the other scales. If Scale G was 
also more valid than Scale C, such a finding would 
tend to support the extension of Clark’s conclusion 
(that shorter scales were equally or more valid than 
longer scales for his data) to an independent set of 


TABLE 1 


Rs AND PERCENTAGE OVERLAPS FOR THE 7 SVIB SCALES 














Percentage 0 











Number of Weighting ry (Standard (Standard 
Scale items method error) error) 

A 170 +1234 315 (.065) 72% (6.2%) 
Bie 27 +234 Lie 3065) 75% (6.3%) 
cS 170 +1 3075 (.065) 73% (6.2%) 
D 57 +1 3225 (.065) 70% (6.2%) 
E 57 +1 172 (.065) 86% (6.3%) 
r 56 +1 .216 (.065) 82% (6.3%) 
G 57 (best) +1 3255 (.065) 70% (6.2%) 
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data. If the validity results of Scales D, E, and F 
exceeded that obtained with Scale C, such a finding 
could be advanced as strong evidence that Clark’s 
hypotheses were true, independent of item validity 
differences, since the validity of the three parts 
(Scales D, E, and F) would have been greater than 
that of the whole (Scale C). That is, there was an 
equally, or more valid scale length in this study 
substantialy below 170 which could not be attributed 
to favorable differences in item validity. 


Application of Scales to Sample 2 


Each of the scales developed on Sample 1 was 
cross validated on Sample 2 managers. A total SVIB 
score was obtained for each manager. Product- 
moment correlations and Tilton’s measure of overlap 
were computed along with their standard errors of 
estimate. The correlations were obtained for the 234 
pairs of SVIB and criterion scores. The percentage 
overlaps were computed on the high- and low- 
criterion groups. This provided a more varied picture 
of the relationship between the scales and criterion 
scores, since examination of the data revealed that 
the scales did not significantly differentiate managers 
in the middle-criterion group from managers in the 
high- and low-criterion groups. 


RESULTS 


The results of the application of the seven 
scales to Sample 2 are presented in Table 1. 
Correlations, percentage overlaps, and stand- 
ard errors are included for each scale. 


Discussion 
Weighting Method Implications 


Comparison of Scales A and C suggests that 
there was no practical difference between the 
variable and unit weighting methods when the 
number and validity of items were equated 
(7’s = .315 and .3075, percentage overlaps 
= 72% and 73%, respectively). Apparently 
unit scales may be used with no significant 
loss in validity. However, it sheuld be noted 
that the validity difference between Scales A 
and C was in the same direction (favorable 
to the variable weighting method) as findings 
reported by Strong et al., but is even smaller 
than the small differences typically identified 
in their study between variable and unit 
weighted scales. The smaller difference ob- 
served in this study can probably be traced 
to the lower validity of scales in this study 
compared to that of scales in the study by 
Strong et al., and corresponding differences 
in the number of items weighted and magni- 
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tude of variable weights assigned response 
categories. Only 27 of the pool of 400 items 
had percentage differences of 18% or greater 
deserving a weight of + or —2, 3, or 4. Only 
five of these had weights assigned of + or 
—3, and none had a weight of + or —4 as- 
signed. The scales studied by Strong et al. 
had many more items weighted with more and 
higher variable weights. Thus it is suggested 
that differences in validity between Scales A 
and C caused by the different weighting meth- 
ods used may have been severely constricted. 
Also, the implications of results obtained in 
this study for more typical SVIB scale de- 
velopment situations may be further limited 
by the greater heterogeneity of jobs and 
subjects included in the present study. 

Considering the great consistency of small 
differences in favor of the multiple-weighting 
method observed in the study by Strong et 
al. and the corroboration of the direction of 
these differences in the present study, it is 
concluded that such differences are likely to 
be very stable and would be consistently 
replicated in other studies which introduced 
the same controls. Whether these differences 
in favor of the variable weight scales are 
great enough to warrant the extra work in- 
volved in their development and application 
is a different question which must be an- 
swered in each situation. Application of the 
immortal D. G. Paterson’s rule of thumb that 
“A difference, to be a difference, must make 
a difference” suggests that in most cases these 
differences are not likely to be of practical 
significance. 

Comparison of Scales A and C with B sug- 
gests the latter scale was less valid (r = .27, 
percentage overlap = 75%). This is in har- 
mony with Clark (1961) and Strong’s et al. 
(1964) hypothesis and findings which indi- 
cated the minimum number of items for such 
scales to be in the vicinity of 40. The results 
obtained with Scale G further support this 
hypothesis since with 57 of the most valid 
items (Scale B had 27 of the most valid 
items), the validity was increased from an 7 
of .27 to .325 and at least slightly over all 
other scales developed. 

It is concluded that the evidence in this 
study supports the contention that the superi- 
ority of scales developed with the weighting 
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TABLE 2 


SIMPLE RATING OF How WELL Items IN SCALEs D, FE, 
AND F Herp UP IN THE Cross VALIDATION SAMPLE 
(SAMPLE 2) 
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and .3075, percentage overlaps = 70% and 
73%, respectively), but results with Scales E 
and F showed a substantial drop in valid- 
ity (”’s=.17 and .22, percentage overlaps 





“Best 10” Sample 1 items from each scale 





Scale D Scale E Scale F 
5 good 1 good 6 fair 
2 fair 3 fair 4 no 
3 no 5 no 
1 opposite 


All 57 Sample 1 items from each scale 





Scale D Scale E Scale F 
21 good 13 good 14 good 
13 fair 15 fair 12 fair 
19 no 20 no 23 no 

4 opposite 9 opposite 7 opposite 





method represented by Scale B in the study 
by Strong et al. was substantially due to the 
favorable item length of these scales in com- 
parison to scales weighted with different meth- 
ods. Consequently, this method of weighting 
items in scales is not recommended except 
where its application results in the inclusion 
of considerably more items than the 27 
included in Scale B in the present study. 


Length of Scale Implications 


The results presented in Table 1 tend to 
support the contention by Clark and Strong 
et al. that scales may be shortened without 
significantly sacrificing validity. Comparison 
of results with Scales D and G to Scales A 
and C appear to substantiate this conclusion. 
Also, it appears that the “40 to 60 item 
optimum range” hypothesis is supported in 
the comparison of Scale B with G. 

However, the question of whether these 
favorable findings for the shorter scales was 
due to the greater average validity of items 
in these scales is not clear. It was difficult 
to interpret results obtained with Scales D, 
E, and F, for which validity of items was 
controlled, compared to results obtained with 
Scale C which included all the items in these 
three scales. Scale D results indicated it to be 
slightly more valid than Scale C (7’s = .3225 


= 86% and 82%, respectively). 

It was suspected that the disparate results 
obtained with Scales D, E, and F were due 
to unexpected sampling error resulting from 
the systematic selection of items for these 
scales. A cursory examination of item valid- 
ities in these scales indicated no significant 
differences between them from Sample 1 data. 
However, item analysis of Sample 2 responses 
indicated that the items in Scales E and F 
shrank markedly more in validity than did 
the items in Scale D. A simple four-category 
classification (good, fair, no validity, and op- 
posite direction of validity) of how well items 
held up in Sample 2 revealed the results 
presented in Table 2. 

In spite of the confusion introduced by 
these unexpectedly differing results obtained 
with Scales D, E, and F, comparison of 
results obtained with Scale G to those ob- 
tained with Scales D, E, and F might be 
construed as supporting the hypothesis that 
the reason shorter scales have been found to 
be as valid as longer scales is at least par- 
tially attributable to the greater average 
validity of items included in them. All four 
of these scales had virtually the same number 
of items, but items in Scale G were chosen 
to maximize validity while items in the other 
three scales were chosen to approximate the 
average item validity of items included in 
Scale C. Results obtained with Scale G were 
slightly more valid than for Scale D and 
considerably more valid than for Scales E 
and F. Although averaging of disparate cor- 
relations is not statistically permissible, the 
slight increase in validity of Scale G over the 
other three scales is apparent. 

The determination of the reason for im- 
proved or commensurate validity of shorter 
scales over longer ones will have to be ob- 
tained with additional investigation. It is not 
recommended that shorter scales be indis- 
criminantly used for all SVIB applications 
until this question is more clearly answered. 
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However, except in situations where “large” 
percentage differences between item responses 
would be excluded from the shorter 40- to 
60-item scales, it would appear safe to use 
them. How large is “large” is not clear from 
this study. Although items ranging in per- 
centage differences from 10% to 16% were 
left out of Scale G in this study with no 
apparent adverse validity effects, smaller Vs 
in this study compared to the occupational 
groups customarily gathered for development 
of SVIB scales makes this finding of doubtful 
applicability for them. A smaller percentage 
difference would tend to be more stable in 
the larger criterion groups of the SVIB. 
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A FACTORIAL STUDY OF THE FEMALE 
FORM OF THE SVIB’* 


HARRY E. ANDERSON, Jr.” 
College of Health Related Services, University of Florida 


The female form of the SVIB was administered to 203 freshman and sopho- 
more female students in an introductory course in the University of Florida’s 
College of Health Related Services. 29 scales were scored, intercorrelated, and 
factor analyzed. 9 factors resulted with loadings of +.35 or more. The factors 
are discussed in terms of implications for female vocational interest and com- 
pared with previous factor analytic studies with the male form of the Blank. 
Groupings of occupations are presented, also, based on the factor structure of 


the female interests. 


The Strong Vocational Interest Blank 
(SVIB) is one of the most widely used interest 
scales. Most of the research, however, (e.g., 
Kirchner, 1961; Stordahl, 1954; and Tucker 
& Strong, 1962), including the development of 
new scales (e.g., Knauft, 1951; Kriedt, 1952; 
and Strong, 1949), has dealt with the male 
form of the SVIB. There has been some work 
with the female form of the SVIB, such as 
Hilgard’s (1939) study of nursing and Stone’s 
(1960) development of a key for secretarial 
students, but there has been no concentrated 
effort of analysis with the female form. Strong 
(1943) avers that male and female interests 
are, on the average, very similar, and further 
that from the results of seven factor analyses, 
only four or five factors are needed to account 
mathematically for all, or nearly all, of the 
variations in interests among occupational 
groups. There is an underlying implication, 
therefore, that the same number of factors is 
required for the female form of the SVIB. The 
present study was designed to investigate the 
number of factors in the female form of the 
SVIB. 


MeEtHopD 


Over the past 3 years, beginning in 1961, the 
SVIB was administered to 203 female students who 
took the introductory course in the College of Health 
Related Services at the University of Florida. Trans- 
formed standardized scores were used on 29 scales 
to produce product-moment correlations and factor 


1 This study was supported in part by a research 
grant from the Vocational Rehabilitation Adminis- 
tration and part of the data for this study were col- 
lected under National Institute of Mental Health 
Project Grant 380, The Public Mental Health Meth- 
ods in a University. 

2Now at the University of Georgia. 


analytic results.* The 29 X 29 intercorrelation matrix, 
with squared multiple correlations throughout the 
principal diagonal, was factor analyzed by the 
principal component method and the resulting struc- 
ture was machine rotated by the normalized varimax 
routine (Kaiser, 1958). 


RESULTS 4 


The extraction of 16 principal component 
factors effectively traced the correlation mat- 
rix. The 16 factors were rotated, by the vari- 
max method, through nine iteration cycles. 
The original and final communalities, before 
and after rotation, agreed to the fifth decimal 
place. 

Of the 16 rotated factors, 9 factors had 
variables with loadings of plus or minus .35 
or larger in magnitude, which were considered 
significant for our study.® These results con- 
stitute the major findings of the study and are 
presented in Table 1. No attempt will be 
made to “name” the factors, but they can be 








3 Three of the scales, Physical Therapist, Engi- 
neer, and Sister-Teacher were excluded from the 
analysis because of incomplete data. 

4Complete tables of intercorrelations, means, 
standard deviations, and unrotated and rotated fac- 
tor matrices have been deposited with the American 
Documentation Institute. Order Document No. 8334 
from the ADI Auxiliary Publication Project, Photo- 
duplication Service, Library of Congress, Washing- 
ton, D. C. 20540. Remit in advance $1.75 for micro- 
film or $2:50 for photocopies and make checks pay- 
able to: Chief, Photoduplication Service, Library of 
Congress. 

5Strong (1943) considered variables only with 
loadings of .40 or larger as significant; it should be 
noted that this departure from his criterion is not 
the cause of the interpretation of more factors in 
this study, as all nine factors have variables with 
loadings of .40 or larger. 
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TABLE 1 


Major Factor LOADINGS FROM THE FEMALE ForRM OF THE STRONG 
VOCATIONAL INTEREST BLANK 

















Factor 

Occupation it II Til IV V Wil WARE AWA IDS 
Occupational therapist 158 
Laboratory technician — .887 
Housewife .663 — A71 
Stenographer-secretary .803 
Physician —.599 —.659 
Social worker 376 545 
Artist —.749 
Author —.7137 
Business education teacher .880 
Buyer 508 500 
Dentist — 869 
Dietician 452 .662 
Elementary teacher .506 — .370 389 
English teacher So 710 
Home economics teacher 406 385 .648 
Life insurance saleswoman 564 445 
Lawyer 799 
Librarian — 4860 —.384 —.306 —.413 
Mathematics and science teacher — .800 
Music performer 41 549 
Music teacher 598 490 
Nurse .689 
Office worker 853 
Physical education teacher (College) 399 649 
Physical education teacher Bfoil 
Psychologist — 384 .670 
Social science teacher 2392 B04 
Y.W.C.A. secretary 786 
Masculinity-Femininity 618 402 


Note.—Blank cells indicate loadings between —.35 and +.35. 


described quite generally in terms of their 
composition. 

The first five factors are bipolar factors. 
The first factor includes, for the most part, 
skilled technical and clerical vocations on the 
positive end with professional vocations on the 
negative end. Factor II seems to contrast 
technical scientific professions with nonscien- 
tific professions. Factor III has, on the posi- 
tive end, professions that one normally as- 
sociates with relating to adults while house- 
wife and elementary teacher on the negative 
end are vocations that usually involve rela- 
tions with children. The fourth factor is the 
health services factor with occupational ther- 
apy and nursing as the major determinants of 
this factor, while the fifth factor appears to be 
determined by instructional physical education 
variables. 


The sixth factor has social service voca- 
tions although the college physical education 
teacher has a strong loading on this factor... 
The seventh factor is determined by non- 
scientific teaching variables. Factor VIII is 
obviously a culinary factor, and the ninth 
factor is determined by musical vocations. 


DIscUSSION 


One of the most interesting variables in this 
analysis is the librarian vocation. This scale 
seems to contrast, or balance, with several 
other vocations in determining the factor 
structure of the female SVIB. It appears on 
Factor I with professional vocations and on 
Factor II with scientific vocations. It has 
negative loadings, however, on the fourth and 
fifth factors in contrast to health services and 
physical education teaching vocations. For 


Die 


females, then, the librarian scale reflects a 
complex interest of a technical, professional 
kind antithetical to gross physical activity. 
Furthermore, the opposite-sign loading of the 
masculinity-femininity scale on two factors, 
II and IV, implies masculine characteristics 
with female librarian interest. 

There are several other interesting aspects 
of the results. The physician scale, for in- 
stance, loads markedly on Factor I with pro- 
fessional vocations of a somewhat creative 
type, and again on Factor II with technical, 
scientific professions. The health professions 
appear clearly on Factor IV and the negative 
loading of the librarian scale on this factor 
seems to pronounce the person oriented char- 
acteristics of occupational therapy and nurs- 
ing. Moreover, femininity is positively re- 
lated to these health professions on Factor IV 
while the loadings on Factor II suggest that 
the technical, scientific professions have more 
masculine characteristics in terms of interest. 

In comparison to other factor analytic 
studies with the SVIB (Carter, Pyles, & 
Bretnall, 1935; Strong, 1943; Thurstone, 
1931), there appear to be about twice as 
many factors in the female form than in the 
male form, even though there is a substan- 
tially large number of scales in the male form. 
This result may be due to more complex in- 
terests of females, but there are at least two 
other reasons for the dissimilarity of results. 
The first reason is a technical one in relation 
to analytic methods. The previous factor ana- 
lytic studies were completed prior to the de- 
velopment of computer programs for the ro- 
tation of factors, so that a hand rotation of 
the female factors, or a machine rotation of 
the male factors, may produce a more similar 
structure in the interests of males and females 
as reflected by the SVIB. The second reason 
for the apparent dissimilarity may reside in 
the sample selected for study. On the one 
hand, students in the College of Health Re- 
lated Services should be more homogenous in 
their interests than a random sample of stu- 
dents taken at large, which would include 
students interested in a wide variety of voca- 
tions; on the other hand, there are many ac- 
tivities in the health professions that represent 
various kinds of interest for females, such as 
“helping” versus scientific interests. The dif- 
ferences due to sampling seem difficult to 
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justify, but, nevertheless, the question remains 
open for further study. 

The SVIB has group scores available on the 
male form of the inventory. The group scores 
resulted from occupational groupings based 
on the previous factor analytic studies 
(Strong, 1943). The same type of grouping is 
possible for the female form of the inventory. 
Nine groups of occupations, therefore, are 
presented in Table 2 based on the present 
factor analytic study. Group I includes a com- 
plex of scales clustered together on Factors I, 
III, VI, and VII. Group II is taken from the 
highest-negative loadings on Factor II. Group 
III contains the scales with the largest-nega- 
tive loadings on Factor I, excluding physician, 
which has an even larger loading on Factor II. 
Group IV contains the scales with the largest- 
positive loadings on Factor I with the excep- 
tion of housewife and elementary teacher 
which seem more reasonably placed in Group 
I; also dietician and home economics teacher 
have stronger loadings on Factor VIII and so 
form Group V. Group VI is taken from Fac- 
tor IV, Group VII from Factor III, Group 
VIII from Factor TX, and Group IX from 
Factor V. The physical education teacher 
(College) has a stronger loading on Factor 
VI than on Factor V, but it seems more rea- 
sonable to place it with another physical edu- 
cation teaching scale, if the other scales on 
Factor V are taken for the complex in Group 
I. These groups of scales will be used for 
further research with students in the health 
related professions. 

There may be some question as to the 
generality of the above groupings of scales 
because of the selected, homogeneous sample 
used herein. Differences in scale means, of 
course, can be expected among various groups 
but differences in standard deviations and 
correlations would carry serious limitations 
for wide use of the results of this study. The 
literature is not replete with studies of the 
female form of the SVIB, and many, such as 
Verburg’s (1952), contain only a few se- 
lected scales; in comparison with the few 
scales used in Verburg’s study, the means re- 
ported herein are somewhat different but the 
standard deviations are quite similar. Two 
sets of more extensive data are presented by 
Strong, 1943. The first set of data was col- 
lected on 200 nurses and contains means and. 
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TABLE 2 


OccuPATION GROUPINGS BASED ON THE RESULTS OF 
THE FAcTor ANALYTIC Stupy 








Group Occupation 
I Housewife, elementary teacher, english 
teacher, social science teacher, social 


workers, Y.W.C.A. secretary 

II Physician, laboratory technician, dentist, 
mathematics and science teacher 

Ill Artist, author, librarian 

IV Stenographer-secretary, business education 
teacher, buyer, office worker 

V Dietician, home economics teacher 

VI Occupational therapist, nurse 


VIL Lawyer, life insurance saleswoman, psy- 
chologist 
VIII Music performer, music teacher 


IX Physical education teacher (College), physi- 
cal education teacher 





standard deviations on 16 scales and the cor- 
relations between the nurse scale with the re- 
maining 15; in comparison with data herein, 
the average difference in means is 5.0, in 
standard deviations is 1.7, and in correlations 
with the nurse scale, .30. The large differences 
in correlations are particularly disappointing, 
especially in view of the apparent similarity 
of the two samples, and some of the larger 
discrepancies are as follows, the correlations 
given being between the nurse scale and the 
indicated scale: Strong’s office worker is .55 
and ours is —.05, Strong’s housewife is .59 
while ours is .18, Strong’s stenographer-secre- 
tary is .57 and ours is —.11, and Strong’s 
librarian is —.74 while ours is —.28. These 
are the largest differences and the reader must 
judge for himself which correlations are most 
reasonable for determining general factor 
structures. The second set of Strong data, on 
page 718, contains no means or standard devi- 
ations for comparison, but intercorrelations 
are presented for 19 scales based on data col- 
lected from 500 housewives; the average dif- 
ference between these data and our data 
across the 171 correlations is about .20, which 
again may not be too encouraging. One rather 
startling thing about the two sets of data pre- 
sented by Strong resides in the 15 nurse- 
“other” scale correlations; 14 out of the 15 
pairs of correlations are exactly the same, a 
result that seems somewhat unlikely, espe- 
cially in view of the differences in samples. 
A final comparison for the groupings of 
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scales is available in the 14 groups of 30 
scales, excluding the Masculinity-Femininity 
scale, available on back of the Women’s SVIB. 
Five of these groups, for corresponding scales, 
are the same as our Groups III, V, VI, Vit 
and IX in Table 2. Our Group I is a combi- 
nation of four of Strong’s groups, excepting 
for psychologist which Strong puts with social 
worker; Strong does group lawyer and life in- 
surance saleswoman together, however, which 
is our Group VII. Our Group II is a combina- 
tion of two of Strong’s groups, and Group IV 
represents two more of Strong’s groups. All 
in all, then, the grouping of scales from the 
present study appears to be a condensation of 
the grouping available on the back of the 
women’s SVIB blank. 
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A FACTOR ANALYTIC STUDY OF NEGRO AND WHITE 
RESPONSES TO ADVERTISING STIMULI 


ARNOLD M. BARBAN 


University of Illinois 


AND 


WERNER F. GRUNBAUM + 


University of Houston 


In order to develop a method for determining group reaction to advertising and 
mass media concepts, 93 white (W) and 88 Negro (N) Ss judged 10 advertising- 
type stimuli on a 22-scale semantic differential. Factor analyses yielded a scale 
structure and a concept structure. Ns and Ws had a similar scale structure. The 
concept structure revealed that Ws distinguished between “typical” advertising 
stimuli and those with racial overtones; while Ns, on the other hand, treated 
all concepts similarly on the first factor. The present approach determining the 
semantic and “conceptual” structure in separate factor analyses may be applica- 
ble to other problems involving the semantic differential. 


The purpose of the study reported in this 
paper was to develop adequate techniques to 
measure group reactions to advertising and 
mass media concepts. The approach utilized 
the semantic differential, developed by Os- 
good (1952), to measure the reactions of a 
Negro sample and a white sample to a series 
of advertising-type stimuli. 

The semantic differential involves, in a gen- 
eral sense, a subject’s (S’s) terminal “re- 
sponse” to a series of scales which relate to a 
verbal or pictorial “concept;” a concept can 
be referred to as “stimulus” (Osgood, Suci, & 
Tannenbaum, 1957, p. 77). However, in a 
strict sense, this process may be viewed as the 
S’s terminal response to two simultaneous 
“stimuli,” ie., both to the verbal or pictorial 
concept as well as to the verbal “scale” ap- 
plicable to that particular concept. This study, 
therefore, suggests that in many applications, 
such as the present one involving advertising- 
type concepts, it is fruitful to consider both 
the structure of the concept matrix as well as 
the structure of the scale matrix, because both 
types of stimuli are relevant to the problem 
under investigation. 

Specifically, two separate factor analyses 
are each applied to the white and the Negro 
samples. The first factor analyses of the scale 


1 The authors wish to express their appreciation to 
the Computing Center of the University of Hous- 
ton, especially to Elliott I. Organick, Director of 
the Center for the generous allocation of free com- 
puter time. The responsibility for any errors in the 
computer programming is, of course, solely that of 
the authors. 
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matrices are used to determine the semantic 
structure of the responses of the two samples 
concerned, while the purpose of applying the 
second factor analyses to the concept matrices 
is to reveal the “conceptual” structure inferred 
from the responses of these samples. 

It is not argued that the scale and the con- 
cept frameworks are relevant in all situations 
involving the use of the semantic differential. 
However, it is hoped that in certain other 
applications the suggested technique may 
prove helpful. 


METHOD 
Subjects 


The Ss consisted of 181 students from randomly 
selected sections of freshman English classes at two 
universities in Houston, Texas—one, a predomi- 
nantly white institution, the other predominantly 
Negro. The NW for whites was 93; for Negroes, 
N =88. The overwhelming majority of the Ss were 
unmarried (90.1%) and from Texas (90.1%). The 
mean age of white Ss was 19.2 years whereas Negroes 
averaged 20.4 years. 


Concepts Measured 


The Ss were exposed to the 10 below-listed stim- 
uli (concepts) by means of a 35 millimeter slide 
projector: 

Concept 1: Advertising—presented in bold black 

 Jetters. 

Concept 2: Illustrated . (art-drawn) advertisement 
(with people)—a two-color cigarette 
advertisement. 

Concept 3: National magazine—full-color photo of 
a well-known Negro singer against the 
magazine masthead. 

Concept 4: Local Negro newspaper—masthead was 
shown. 


RESPONSES TO ADVERTISING STIMULI 


Photographed advertisement (white 
models)—a full-color cigarette adver- 
tisement from national media with 
white models. 

Photographed advertisement. (integrated 
models)—one Negro and two white 
models in a black and white magazine 
advertisement. The layout was such that 
each of the three models appeared in a 
separate panel. 

Local Negro radio station—call letters 
presented in bold black type face. 
Illustrated advertisement (no people) 
—a cigarette advertisement with a color 
illustration of the cigarette pack. 
National Negro magazine—a photo 
identical to that used in Concept 3, but 
with the Negro magazine’s masthead. 
Photographed advertisement (Negro 
models)—an advertisement identical 
with that in Concept 5, but with Negro 
models. 


Concept 5: 


Concept 6: 


Concept 7: 


Concept 8: 


Concept 9: 


Concept 10: 


The showing of the concepts was in the sequence 
noted above, with the exception that 3 and 9, as well 
as 5 and 10, were alternated in order to avoid any 
unique position bias for these special concepts. 

Concept 1 served as a typical verbal stimulus. 
Concepts 3, 4, 7, and 9 related to advertising media, 
especially media designed for Negroes (4, 7, 9). 
Typical print-medium cigarette advertisements were 
represented by Concepts 2, 5, 8, and 10 and ranged 
in pictorial representation from art-drawn figures 
(2), to photographed models (Concept 5—white 
models; Concept 10—Negro models), to a “no 
people” illustration (8). Concept 6 exemplified a so- 
called “integrated” advertisement, i.e, it contained 
both whites and a Negro. 


Scale Instrument 


The responses of Ss were measured by Osgood’s 
(1952) semantic differential. The 10 concepts were 
each judged against a sample of 22 bipolar scales 
which are included in Table 3. Scales were presented, 
after determining sequence and polarity by random 
selection, with seven-position alternatives. The in- 
clusion of the “qualifiers” (extremely, quite, slightly, 
etc.) on the form is supported by research done by 
Wells & Smith (1960). The format, thus, appeared 
as shown in next column. 

The 22 scales used were derived from a free associ- 
ation test administered to groups of white and Negro 
students chosen from the same universities as the 
ultimate test groups. The Ss responded to a series of 
concepts similar to the final-experiment concepts. 
Responses were analyzed for frequency, relevance to 
concepts, and semantic stability, as suggested by 
Osgood, Suci, & Tannenbaum (1957, pp. 78-80). The 
number of scales chosen (22) was limited by the 
fact that the test was to be administered to students 
in a 50-minute class session. 


JUT[NISR IN 
AI3) 


Extremely 








Quite 
Slightly 
(Equally) 
Slightly 


| Neither 
| Quite 


s/s... 
. . . . . . 


Extremely 








OUTUIUII |] 
Inynnesg 


Test Procedure 


The tests were conducted in 9 sections of freshman 
English classes (4 white, 5 Negro) in spring, 1963. 
The class instructor introduced the experimenter 
(EZ) as a “guest.” Projector and screen had been 
previously set up. 

Booklets, containing 1 page of scales for each con- 
cept to be judged, were distributed to each S. The 
E then proceeded to give instructions on how to 
judge concepts against the semantic differential scales. 
Such instructions can be noted in Osgood et al... 
(1957, pp. 82-84). The Ss were shown a sample set 
of scales on the screen in order to thoroughly 
acquaint them with the judging method. Questions 
were answered, after which the prescribed sequence 
of concepts was presented. Each concept was com- 
pletely judged by all Ss before the next concept was 
flashed onto the screen. 


Analytic Method 


Intercorrelation Matrices. Each S responded to a 
given scale (for a particular concept) by checking 
one of seven positions, the positions being arbi- 
trarily assigned, from left to right, a number from 
1 to 7. Thus, the data can take the form of a cube, 
involving Ss, concepts, and scales. Using the Osgood, 
et al. (1957, pp. 35-36) methodology, an intercorre- 
lation matrix can be generated by summing over 
both Ss and concepts. Since 22 scales are involved 
in this study, the result is a 22 X 22 scale matrix. 
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TABLE 1 


BARTLETT'S TEST OF SIGNIFICANCE APPLIED 
To WuitE-NEGRO SCALE ANALYSIS 


Eigenvalue Chi-square 








Root) —$— =a SS 
number White Negro df White Negro 
1 7.46 6.29 21 480.33***  336.90*** 
2 1.78 dj. Si-O] 7A Ons one 
3 1.44 23S 40.20** 16.74 
4 1.34 tye 39.74** 15.81 
5 92 1:08) 117 13532 13.50 
6 87 102 eo 12.47 12.79 
7 85 9 is 13.67 8.05 
8 Bid 88 14 10.18 9.13 
9 nS -O2mmELS 10.04 7.65 
10 67 ii ee 7.80 6.46 
11 04 2m eid 5.86 
12 59 69 10 6.28 5.38 
13 56 .66 9 5.69 Sill 
14 ae 98 8 5.75 Sala, 
15 A8 56 ii 4.26 3.40 
16 AS 00 6 3.76 1.91 
17 38 48 5 1.10 ele 
18 36 AT 4 99 2.36 
19 ao AS 3 1.08 SiS 
20 “oD oi 2 16 1.01 
21 .28 POL 1 Pal ely 
22 eS 28 0 01 01 
py <.01. 
ee D < 001. 


If X1,;,y is the numerical response (from 1 to 7) on 
the i‘ scale, representing the j‘" concept, for the v™ 
subject, then the correlation matrix may be calcu- 
lated by summing over concepts and Ss for the 
variations and the covariations, with the mean for 
the i‘ scale being calculated also by summing over 
concepts and Ss. Thus, the variation (white sample) 
is 
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and the covariation between Scale i and Scale i + 1 is 
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Mrs 
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(Xi,j,0 — Xi) (Xigj0 — Ney). [2] 


2 
Il 
- 


d 


The correlation matrix is then calculated from the 
appropriate covariation with the corresponding vari- 
ations, and Xi,; = Xj,1. 

The 10 X 10 concept matrix is obtained in a simi- 
lar manner by summing over scales and Ss rather 
than over concepts and Ss. Since only the summa- 
tions differ from the previous notation, it is neces- 
sary here simply to present the variation, which is 


2 


98 
a A 0 


i) 


— 5)’. [3] 
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And the same procedure (for both the scale and 
the concept matrices) was performed for the Negro 
sample (v = 88). 

Factor Analyses. The 22 X 22 scale matrices for 
whites and Negroes and the 10 X 10 concept matrices 
for whites and Negroes were factor analyzed by the . 
principal factor solution (Harmon, 1960, pp. 164 ff) ; 
Bartlett’s test of significance (Bartlett, 1950), using 
1s in the diagonal elements, was then applied. All 
four matrices were then factored again, using Gutt- 
man’s image theory which modifies the off-diagonal 
elements and sets the diagonal elements equal to the 
squared multiple-correlation coefficients, thus yield- 
ing the factor loadings (Guttman, 1953). The use of 
Guttman’s image theory seemed to obviate the need 
for rotation; this decision was further reinforced by 
the fact that the elements could be identified in 
their present form. Regarding computational mat- 
ters, the determinant was computed for Bartlett’s 
theory, and the Jacobi method was used to calculate 
the eigenvalues and the eigenvectors for the factor 
analyses. 


RESULTS AND DISCUSSION 


Results of the Bartlett test for scale data 
are given in Table 1, indicating 4 statistically 
significant factors for white Ss and 2 for Ne- 
groes. Concept analyses (see Table 2) yielded 
2 factors for whites and 3 for Negroes. 


Scale Structure 


The factor analyses of the 22 X 22 scale 
matrices resulted in the factor loadings pro- 
vided in Table 3. 


TABLE 2 


BARTLET?’s TEST OF SIGNIFICANCE APPLIED 
TO WHITE-NEGRO CoNcEPT ANALYSIS 








Eigenvalue Chi-square 
Root 

number White Negro df White Negro 
1 3.36 427 9 134.33***) Da7e(eees 
2 1.70 1.09 8 51.80*** 16.96 
3 .98 OL 13.42 18.78** 
4 82 78 6 8.09 9.93 
5 afi 65. 5 5.84 5.65 
6 68 58 4 7.29 4.68 
7 61 on 4S 7.79 5.98 
8 foul sol ee? 7.38 10.48 
9 38 Sane L 4.78 6.52 

10 .24 2.0 mee 0) .00 .00 


« However, since this test is additive, this factor should 
be interpreted in terms of the remaining chi- -square rather than 
that applicable to the individual factor. Thus, it is also sig- 
nificant at ae -01 level. 


*K D <.001. 
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TABLE 3 


Factor LoApINGS: SCALES 











Factor loadings 














SMCs 
White Negro White Negro White AVS Ce! 
Scale I I II II III IV White Negro 

Ugly-Beautiful 1S aval 05 #25 15 O01 61 .60 
Masculine-Feminine A8 56 ih 38 26 5 AT 2S) 
Strong-Weak —.26 —.19 43 28 11 O1 Aol 20 
Relaxed-Tense —.54 —.953 —.16 08 18 ali! 38 34 
Low Class-High Class 70 .60 —.05 .10 19 — .04 56 Al 
Dull-Sharp 74 .66 —.15 —.05 .02 —.02 59 46 
Artificial-Real 62 A9 02 —.05 — .06 00 A2 PA 
Unfriendly-Friendly 74 63 oP —.14 —.19 .03 .05 49 
Warm-Cool — 40 — 22 —.14 10 07 male 22 09 
Sad-Happy 66 69 etn = 246 10 Slee ss 
Good-Bad —.71 —.61 20 09 — .03 .03 58 AS 
Static-Dynamic A5 A5 —.13 O4 09 O07 Beil E20 
Old-Young BOT aol alle! 15 07 05 38 Rol 
Orderly-Chaotic 33 00 13 BLS —.15 23 nS .20 
Emotional-Rational —.18 —.13 02 08 ai, —.25 a5 .06 
“For Me’’-“‘Not For Me” —.63 00 21 04 —.09 03 50 Al 
Active-Passive — 36 — 36 nls 24 20 —.20 26 .23 
Heavy-Light 34 By 30 18 ail) Sei) eo 18 
Hard-Soft DS 39 26 pis; —.02 — .06 Al Sill 
American-UnAmerican —.26 —.16 18 30 23 miey 21 BS 
Unpleasant-Pleasant .80 he —.07 —.09 .03 —.06 .68 56 
Complex-Simple —.10 =.07 wd 16 —.09 —.21 a2 .10 
Total SMC = hiss) fA Te 

Variance: White 73.91% 8.45% 4.63% 3.89% 90.88% 

Negro 72.94% 9.53% 82 ATF, 





8 Squared multiple-correlation coefficient. 


The first factor, for both whites and Ne- 
groes, is in accord with Osgood’s findings 
where it is classified as “evaluative” (Osgood 
et al., 1957, pp. 31-75). The scales with high 
loadings along the dimension are: * Beautiful- 
Ugly, High Class-Low Class, Sharp-Dull, 
Friendly-Unfriendly, Happy-Sad, Good-Bad, 
“For Me’-“Not For Me,” Pleasant-Un- 
pleasant. These 8 scales are common for both 
whites and Negroes. In addition, Real-Arti- 
ficial is highly loaded for whites, with a some- 
what lower score for Negroes; Relaxed-Tense 
and Young-Old would also seem to relate to 
Factor I. 

Comparison of whites and Negroes is not as 
direct on succeeding factors as on Factor I. 
Factor II for whites is identified by such 


Note that the scales following are presented in 
terms of standardized polarity, namely, positive- 
negative. 


scales as Masculine-Feminine, Strong-Weak, 
Heavy-Light, and Hard-Soft and can be re- 
lated to Osgood’s “potency” variable. Factor 
III for whites is perhaps less clearly defined, 
with Masculine-Feminine, Active-Passive, and. 
American-UnAmerican appearing, but this 
dimension is likely to be some facet of the 
“activity” variable referred to by Osgood. A 
comparison of these latter 2 factors for white 
Ss with the remaining Negro Factor (II) re- 
veals that the same 6 scales load for Negro Ss. 
Thus, the second factor for Negroes is a com- 
bination of Osgood’s potency and _ activity 
variables which appeared as separate factors 
for the whites. Osgood refers to such a coales- 
cence of factors as a “dynamism factor” (see 
Osgood et al., 1957, p. 74). 

The one remaining variable, Factor IV for 
whites, is somewhat difficult to identify. Or- 
derly-Chaotic, Rational-Emotional, Simple- 
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TABLE 4 
Factor LoApINcs: CONCEPTS 
Factor loadings SMCs 
White Negro White Negro Negro 
Concept ir I II II ral White Negro 
1 Al A3 10 03 .03 19 .20 
2 AO 49 023 .20 —.05 34 pol 
3 65 a — 32 — 32 18 .60 66 
4 .03 44 36 09 10 19 29 
5 65 63 —.05 at, —.18 48 A5 
6 34 58 30 05 —.01 24 oD 
7 nS A2 28 AS) 12 12 wail 
8 39 48 a5 18 — .02 25) 29 
9 .66 Bil — 30 — 33 —.19 .60 .66 
10 .60 iil .06 07 07 AL oS 
Total SMC = 3.42 4.03 
Variance: White 68.27% 18.01% 86.28% 
Negro 81.28% 8.42% 3.41% 93.11% 





a Squared multiple-correlation coefficient. 


Complex, Passive-Active, and Light-Heavy 
can be grouped on this dimension and, per- 
haps, it can be viewed as an “anxiety” varia- 
ble. However, since this factor accounts for 
only 3.9% of the variance, in relation to 
90.9% for all four factors, Factor IVs signifi- 
cance in relation to the entire semantic struc- 
ture is difficult to assess. 


Concept Structure 


Factor loadings for the concept structure 
are available in Table 4. Factor I for white Ss 
includes all of the concepts, with the exception 
of the Negro newspaper and the Negro radio 
station. Further, within this factor, all stimuli 
employing photographic representation (Con- 
cepts 3, 5, 6, 9, 10) are closely grouped, with 
the exception of the integrated advertisement 
(6). Negro and white model situations were 
not distinguished. Finally, all the concepts 
which loaded on this factor did so in relation 
to the concept “advertising.” 

The second factor for whites loaded highly 
for the Negro newspaper and the Negro radio 
station, together with the integrated advertise- 
ment. It also loaded negatively for the Negro 
singer shown on the cover of a national maga- 
zine and for the same singer shown on the 
cover of a national Negro magazine. 


An analysis of Factor I for the Negro Ss 
indicates that all 10 concepts are interrelated. 
Moreover, all the concepts using photographic 
material now group together, including the 
integrated advertisement. Again, Negro and 
white model situations were not distinguished. 

Thus, Negroes did not distinguish, as whites 
did, those concepts having a racial implica- 
tion. They tended to group all stimuli around 
the general advertising concept. Two addi- 
tional factors were statistically significant but 
could not be adequately interpreted. On Fac- 
tor II, the national magazines (1 general, 1 
Negro) were negatively loaded. The same Ne- 
gro singer was on the cover of both magazines. 

On Factor III the Negro singer on the gen- 
eral magazine was positive, while a negative 
loading occurred for the same singer appearing 
on the Negro magazine. The photographed 
cigarette advertisement with white models 
(Concept 5) likewise was negative while the 
same ad with Negro models was slightly posi- 
tive (.07). Accordingly, this was the only fac- 
tor (for either whites or Negroes) which dis- 
tinguished between the general and the Negro 
magazine (Concepts 3 and 9) as well as be- 
tween white and Negro models in an identical 
advertising situation (Concepts 5 and 10; 
incidentally, the integrated advertisement is 
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about midway between the white and Negro 
models). Also, the 2 local Negro media loaded 
positively on this factor. 


CONCLUSIONS 


Two factor analyses were employed. The 
first analyses indicated that Negroes and 
whites had similar scale structures. Although 
this finding is in no sense a startling one, it is 
important to provide this evidence which in- 
dicates the comparability of the scale instru- 
ment for Negroes and whites. Similar semantic 
attributes have been shown for groups indi- 
cating gross differences in the meanings of 
concepts, as was the case for normal controls 
and schizophrenic patients (Bopp, 1955, in 
Osgood et al., pp. 223-224), and as was also 
the case for the 1952 election study involving 
Taft, Eisenhower, and Stevenson samples. 
Also, cultural differences investigated by Ku- 
mata and Schramm showed that Korean and 
Japanese exchange students and American 
college students use the same major factors in 
their meaningful judgments. The present find- 
ing, indicating the comparability of the scale 
device demonstrated in the factor analyses of 
the scale matrices, thus seems in accord with 
previous findings for other groups character- 
ized by gross differences in the meanings of 
concepts. 

The second analyses, of the concept struc- 
tures, suggest that whites distinguished be- 
tween “typical” advertising stimuli and those 
with racial overtones; while Negroes, on the 
other hand, treated all stimuli similarly. More- 
over, both whites and Negroes distinguished 
photographed and nonphotographed concepts 
but whites refused to recognize an integrated 
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advertisement within their photographic 
grouping. In general, both whites and Negroes 
failed to distinguish between white and Negro 
models. However, the third factor for the Ne- 
groes did suggest such a distinction; but since 
it accounted for less than 4% in relation to a 
total explained variance of 93%, its signifi- 
cance for the total conceptual structure is 
difficult to determine. 

The analysis of the “conceptual” structure 
in addition to the semantic structure is im- 
portant because the semantic differential does 
involve two “stimuli”—that of the concept 
and that of the scale. The semantic structure 
in this study failed to reveal any significant 
differences between the white and Negro sam- 
ples; rather, findings indicated the compara- 
bility of the scale instrument for both groups 
and thereby justified its application relative 
to the conceptual differences between the two 
samples. 
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This study examined the job satisfactions of 143 rehabilitation counselors from 
6 state agencies. A Job Satisfaction Inventory (JSI) incorporating satisfactions 
with 8 dimensions was used to investigate counselor job satisfactions and their 
relationships with immediate-intermediate performance criteria. The job satis- 
faction of male and female counselors was found to be essentially similar al- 
though the effects of satisfaction with performance criteria was felt to be dif- 
ferent—dependent to some extent on the sex of the counselor. For women coun- 
selors, greater job satisfaction with pay and security appeared related to bigger 
case loads, more “12” closures and fewer “13” closures. In the case of men, 
there were relatively few instances of significant relationships and these might 


have occurred by chance. 


This study examines the job satisfaction of 
rehabilitation counselors in state general vo- 
cational rehabilitation agencies (DVR). Such 
a study was considered of interest for two 
reasons: (a) it would determine the interrela- 
tionships between various aspects of job satis- 
faction and the relationship of these to vari- 
ous job performance measures for rehabilita- 
tion counselors, and (0) it would contribute 
to our understanding of job satisfaction among 
counselors and have more general interest in 
that similar studies of professional workers 
have not been published. 

The job satisfactions of rehabilitation 
counselors have been subjected to a very lim- 
ited systematic study. One study of job satis- 
faction among vocational rehabilitation coun- 
selors by (DiMichael, 1949) correlated satis- 
faction with scores on Kuder interest scales. 
The correlation of enjoying ‘“‘contacting em- 
ployers to secure jobs” and the Kuder per- 
suasive scale was .28; between “handling 
clerical details” and clerical scale, .32. In the 
only other related study, by (Eddy, 1960), 
some evidence was found supporting the no- 
tion that counselor satisfaction with the whole 
job is related to the amount of interest shown. 


1This study was supported, in part, by the 
Vocational Rehabilitation Administration, United 
States Department of Health, Education, and Wel- 
fare. We are grateful for Leonard D. Goodstein’s 
comments and suggestions. The investigators grate- 
fully acknowledge the sustained interest and help 
from state administrators and staff in the cooperat- 
ing state agencies. 
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Clearly, there is a lack of knowledge of the 
characteristics of job satisfaction for this 
group or other counselors and there is no 
information about the relationship between 
satisfaction and performance measures. 

This study has general interest in that it is 
one of the few which examines the job satis- 
faction among a group of professional workers 
and relates the satisfaction variables to per- 
formance variables. In the reviews by 
(Herzberg, Mausner, Peterson, & Capwell, 
1957) and (Brayfield & Crockett, 1955), the 
studies in most instances concern semiskilled, 
skilled, and clerical occupations. As these re- 
views also show, few studies have used multi- 
dimension satisfaction measures or multiple 
criteria. 


METHOD 


A Job Satisfaction Inventory (JSI) originally 
developed by Johnson (1955) for use with teachers 
was revised and administered to 143 counselors in 
six state Division of Vocational Rehabilitation 
(DVR) agencies. These agencies were located in 
Connecticut, Iowa, Minnesota, Missouri, North 
Carolina, and Oklahoma. After pretesting the inven- 
tory, the number of items was reduced to 70 by 
eliminating those which were not discriminating 
among counselors and the form of responding to 
items was changed.? Johnson’s inventory was se- 
lected since it appeared to cover most of the major 
dimensions of job satisfaction which other research 
had shown as important (Scott, Dawis, England, & 
Lofquist, 1958). In addition, his validation studies 


2 A copy of the revised Job Satisfaction Inventory 
or the Co-Worker Ratings form may be secured on 
request to the authors. 
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using self-estimates of satisfaction as well as satis- 
faction ratings by closely acquainted co-workers as 
criteria appeared promising. The JSI covered the 
following areas: (a) physical and mental exertion, 
(5) relations with associates, (c) relations with em- 
ployer, (d) security, advancement, and finances, (e) 
interest in, liking for, and emotional involvement in 
the job, (f) job information, training, and status, 
(g) physical surroundings and work conditions, and 
(h) future, goals, and progress. 

On this same sample of rehabilitation counselors, 
data on the following criteria were also collected: 


1. Co-worker Ratings—Graphic ratings on six 
specific dimensions and an overall rating of per- 
formance by co-workers acquainted with the 
counselors. 

2. Supervisor Ratings—Supervisors used the same 
blank as the co-workers in making ratings. 

3. Present State Ratings—Includes the present 
rating schemes used in four of the six states and 
efficiency ratings by two states which did not 
quantify performance. Efficiency ratings were 
graphic ratings by at least two state adminis- 
trators on a form devised for this study. 

4. Case Velocity—Random samples of 10 cases 
from “12” and “13” closures submitted by coun- 
selors for 1960-1961 were checked to see how 
long they had been kept in Status “1.” A mean 
length of time score was derived in terms of 
months: time in Status 1 was taken as most 
representative since the counselor has the most 
control of case movement during this period when 
the client is accepted for services but has not 
developed a rehabilitation plan with the counselor. 

5. Average Case Load—The average number of 
people a counselor was working with when the 
year began plus all accepted by him during the 
year—for the years from 1960 to 1962. 

6. Average Closures—For the same 3-year pe- 
riod, 1960-1962, a mean number of cases termi- 
nated was computed. The types of closures in- 
cluded: 12 Closures—clients were given service and 
closed after entering employment; 13 Closures— 
clients were given service but were not judged 
rehabilitated; that is, did not enter employment; 
15 Closures—clients were accepted for services 
but were closed out before substantial services 
were given. 


Most of the above data were available from state 
administrative files. Co-Worker Ratings, Supervisor 
Ratings and Job Satisfaction responses were ob- 
tained by mailing the blanks to district offices within 
states, or to a state staff conference if it was more 
convenient. Precautions were taken to preserve the 
confidentiality of responses, and candidness of re- 
spondents, by coding all ballots randomly so that 
once an identifying slip of paper with the counselor’s 
mame was removed only the investigators could 
determine who had completed which ballot. 
Reliability. The reliability of the total score from 
the JSI, using split-half procedures and the Spear- 
man-Brown correction, was .88. For the various 


281 


subsections within the inventory, corrected split- 
half reliability coefficients ranged from .47 to .89, 
with the mean, using the Fisher Z-transformation, 
being .80. 

Using analysis of variance procedures (Ebel, 1951), 
and ratings from at least two administrators, a reli- 
ability coefficient of .85 was obtained on efficiency 
ratings. Co-Worker ratings, for average ratings, 
yielded a reliability coefficient of .76. Since super- 
visors rated different groups of counselors, the 
reliability of supervisor ratings were estimated from 
what the counselor ratings would be if treated indi- 
vidually rather than as average ratings. This yielded 
a coefficient of .43. 


RESULTS 


Job satisfaction studies in many different 
settings have suggested factors which appear 
to influence and enter into job satisfaction as 
a variable of work adjustment (Robinson & 
Conners, 1962, 1963) and (Scott, et al., 
1958). Among others, whether or not a 
measure of overall job satisfaction or a 
measure of satisfaction with specific job- 
related factors is used appears to make a dif- 
ference. Job satisfaction also appears to differ 
depending on the sex of the respondent. In- 
formation on the role of these factors in the 
job satisfaction of rehabilitation counselors 
was our first objective. 

As a first step, the eight subsections or di- 
mensions of the JSI were intercorrelated to 
assess what independence existed among these 
rather specific job-related factors. Table A ® 
shows these intercorrelations. Although many 
of the 7’s were statistically significant and 
there was some indication of a general satis- 
faction variable, as our further analysis shows, 
it appears that satisfactions with different» 
dimensions of the job were relatively inde- 
pendent of each other. 

A cluster analysis (Tryon, 1939) was per- 
formed on the correlation matrices of male 
and female counselors, respectively, to seek 
some information on how specific dimensions 


3’'Tables A(JSI variable intercorrelations), B(clus- 
ter analysis), and C and D(correlations with per- 
formance criteria) have been deposited with the 
American Documentation Institute. Order Document 
No. 8433 from ADI Auxiliary Publications Project, 
Photoduplication Service, Library of Congress, Wash- 
ington, D. C. 20540. Remit in advance $1.25 for 
microfilm or $1.25 for photocopies and make checks 
payable to: Chief, Photoduplication Service, Library 
of Congress, 
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might cluster together by sex. Table B pro- 
vides the results of this procedure. 

We found that satisfaction dimensions e, 
interest in, liking for, and emotional involve- 
ment in the job, and #, future, goals, and 
progress, formed one cluster for both men and 
women counselors. For both groups, the sec- 
ond cluster involved 0, relations with associ- 
ates and c, relations with employer; however, 
it included g, physical surroundings and work 
conditions for men and f, job information, 
training and status for women. As the B- 
(Belonging) Weights reflected, none of the 
clusters were very pure or relatively isolated 
from other variables in the matrix. The 
within-average correlations of clusters were 
only about 1} times as great as the average 
correlations of dimensions within a cluster to 
all other dimensions. Again, the ill-defined 
clusters reflect relative independence among 
the eight dimensions. The clusters which did 
appear were, overall, quite similar for men 
and women. 

The similarity between men and women was 
also checked by performing a Type I analysis 
of variance (Lindquist, 1953) on the profiles 
of mean scores on the eight dimensions. The 
F of .721 indicated that differences in satis- 
factions between men and women were not 
statistically significant. 

The relationships of job satisfaction dimen- 
sion scores—for male and female counselors— 
with eight immediate and intermediate criteria 
of performance is presented in Tables C and 
D. 

For male counselors, satisfaction with the 
physical and mental exertion of the job cor- 
related significantly with three performance 
criteria: Present State Ratings, number of 
“15” closures, and Case Load Velocity. Those 
who were more dissatisfied tended to move 
cases more quickly in Status 1—while the 
more satisfied tended to move cases more 
slowly. The only other significant relation- 
ship was also negative, between satisfaction 
with future, goals, and progress and number 
of 13 closures; that is, the more satisfied 
tended to get fewer 13 closures, the more dis- 
satisfied more of these closures. 

For female counselors, the criteria of Size 
of Case Load had significant correlations with 
three of the satisfaction dimensions as wel] 
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as total satisfaction scores. These included 
security, advancement and finances; job in- 
formation, training and status; future, goals 
and progress. Satisfaction scores on the phys- 
ical and mental exertion dimension also cor- 
related significantly with Present State Rat- 
ings, Supervisor Ratings, and number of 13 
closures. Women counselors satisfied with the 
physical and mental demands of their job 
tended to get fewer 13 closures while the more 
dissatisfied tended to get more. Satisfaction 
with security, advancement and finances also 
showed a significant relationship with number 
of 12 closures for female counselors; satis- 
faction with physical surroundings and work 
conditions in addition had a significant rela- 
tionship with Present State Ratings. 


Discussion 


Our analysis indicates that rehabilitation 
counselors generally do not see job satisfac- 
tion in quite the same global fashion as skilled 
and unskilled blue- and white-collar workers 
(Carlson, Dawis, England, & Lofquist, 1962). 
Although two major clusters for both men and 
women appeared, these were not well defined, 
that is, were less “pure” measures than other 
studies of job satisfaction have found. This 
finding may be a function of the items within 
each scale; however (Johnson, 1955), whose 
scale we used as a point of departure, grouped 
his items into scales after item analysis. 
Studies of job satisfaction with this group and 
perhaps other professional occupations, our 
data suggest, should incorporate a wider range 
of variables than for the typical business or 
industrial occupation. 

Despite the lack of sex differences on JSI 
subscores, the performance and _ satisfaction 
measure intercorrelations suggest satisfaction 
with working conditions may be a significant 
variable for the women counselors even though 
this condition does not hold for men. Since it 
does not cluster with other satisfaction sub- 
tests, its importance may be linked more to 
the perceptions of supervisors than to its per- 
vasive influence on the job satisfaction of 
women counselors. The lack of sex differences 
also lends support to the (Hulin & Smith, 
1964) view that sex per se is not the crucial 
factor which leads to either high or low satis- 
faction, 
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The admonition of Katzell, Barrett, and 
Treadway (1961) and also of Kahn (1960) 
that, to date, we should not expect relation- 
ships between productivity and job satisfac- 
tion, would seem to hold for our male coun- 
selors. However, there is always the possibility 
that the lack of relationship is a function of 
inadequacies in the measures or the limita- 
tions of the sample studied. 

Since rehabilitation counselors frequently 
complain about the official emphasis on num- 
bers of 12 closures, it seemed reasonable to 
anticipate that job satisfaction would be 
lower when there was a high relationship be- 
tween present state ratings and closures and 
higher when this relationship was not signifi- 
cant. This assumes that such a relationship 
reflects an open emphasis upon numbers. Two 
of the six states participating in the study 
yielded correlations of .93 and .78 between 
these two variables while two others had 
correlations of .09 and .14. Analysis of vari- 
ance showed that the high- and low-correlation 
groups did not differ significantly on the JSI 
variables. Contrary to what one might expect, 
counselor satisfaction is not associated in any 
direct way with the views of state adminis- 
trative staff regarding numbers of closures. 

Our findings suggest that subsequent studies 
of rehabilitation counselors or similar groups 
should consider new aspects of job satisfac- 
tion, such as pride in work group which 
Kahn (1960) found to be associated with 
productivity, the relationship between the 
work environment and productivity as sug- 
gested by Katzell, Barrett, and Treadway 
(1961), or as these same authors suggest, 
the mediating influence of the worker charac- 
teristics. The recent formulation of Katzell 
(1964) also suggests the desirability of con- 
sidering the patterns of workers’ needs and 
values viz a viz job satisfaction dimensions 
being assessed. This approach would suggest 
that the influence of counselor satisfaction 
with a certain aspect of his job would be 
associated with his expectations or needs for 
that satisfaction. To the extent that they are 
congruent, we would anticipate that he would 
be satisfied as a worker and would be more 
productive. 

Job satisfaction among rehabilitation coun- 
selors and other professional workers will 
continue to be of interest if only because of 
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its import to the individuals entering such 
fields. There is, we think, increasing accep- 
tance of the idea that one of the major goals 
of organizations and institutions is to enable 
the members to achieve personal satisfactions 
as well as economic satisfactions from their 
work roles. 
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PSYCHOLOGY OF DRIVERS IN TRAFFIC ACCIDENTS* 


CAROLINE E. PRESTON anp STANLEY HARRIS 


Department of Psychiatry, University of Washington Medical School, Seattle, Washington 


50 automobile drivers whose driving involved them in accidents serious enough 
to require hospitalization were paired with 50 drivers without accident histories 
but matched according to sex, approximate age, race, and educational level. The 
Ss were compared on the basis of their driving experiences and performance on 
written tests. The accident victims differed from the comparison Ss in a higher 
incidence of previous traffic violations but were not distinguishable from the 
comparison Ss on any written tests. The accident Ss were similar to the “safe” 
drivers in describing themselves as much closer to “expert” than “very poor” 
on a driving performance continuum. In fixing the responsibility for the acci- 
dents and in estimating their driving competence at the time of the accidents, 
the accident Ss’ reports are at considerable variance with police reports. 


Traffic accidents have been described as 
relatively infrequent phenomena, determined 
by multiple, not single, factors and, in the 
main, determined by more than one person— 
all this in a constantly shifting environment. 
Traffic accidents for the entire country in 1960 
resulted in 38,200 persons killed, 1,400,000 
disabling injuries, and an estimated total cost 
of $6,500,000,000 (National Safety Council, 
1960). Accidents rank first in the cause of 
death in ages 1 to 24 and are second in ages 
25 to 64 with traffic accidents being far more 
common than any other type of accident. 
Traffic accident research has focused on traf- 
fic engineering, vehicle design, and physical 
and psychological driver characteristics (Mc- 
Farland, Moore, & Warren, 1955). Yet there 
is no consistent understanding of or agree- 
ment about the basic causes of traffic acci- 
dents. Attitudes, judgments, and other psycho- 
logical characteristics of drivers are conceded 
to be ultimately important in the epidemiology 
of traffic accidents (Brody, 1959; Conger, 
Gaskill, Glad, Hassel, Rainey, & Sawrey, 
1959; Forbes, 1953). Many investigators con- 
cerned with traffic research and driver educa- 
tion have long searched for a single, objective, 
easily administered instrument to differentiate 
the competent from the incompetent driver. 


1 This investigation was supported in part by a 
Public Health Service Undergraduate Training Grant, 
No. 2M-5939-C9 from the National Institute of 
Mental Health, Public Health Service; by the Foun- 
dation’s Fund for Research in Psychiatry; by the 
Stuht Psychiatric Fund; and by the O’Donnell Psy- 
chiatric Research Fund. 
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There has been a limited success with such 
instruments in some studies but most written 
tests correlate poorly with one another and 
with driving performance (Allgaier, 1957; 
Conger, et al., 1959). The problem is to prove 
that written test results predict driving per- 
formance which in turn differentiate the 
driver who will, from the driver who will not, 
have an accident in any particular traffic situa- 
tion. Such proof is elusive. 

The present study is an investigation into 
the attitudes and the behavior of drivers in 
recent traffic accidents. This study is not con- 
cerned with causes of traffic accidents, but 
rather with contributing or associated factors 
as assessed by the driver, police, and the in- 
vestigators. For the purpose of this study 
traffic accidents are defined as events in which 
a driver is involved in a motor vehicle colli- 
sion or loss of vehicle control resulting in both 
property damage and personal injury. 


MertTHop 
Subjects 


The subjects (Ss) of this study are drivers in- 
volved in traffic accidents and admitted to four hos- 
pitals in Seattle, Washington during a 6-month pe- 
riod, All Ss were driving their vehicles at the time 
of the accident and were sufficiently injured to re- 
quire hospitalization. There are 50 such drivers in 
this study and they were contacted in order of their 
admission to one county and three private hospitals, 
Contact with the patients usually took place in the 
hospital as soon as medical clearance for the inter- 
views could be obtained. 

Each accident victim was paired with a control S 
of the same sex, approximate age, race, and educa- 
tional level. Criteria for the selection of the control 


PsycHoLocy oF Drivers IN ACCIDENTS 


Ss were that they had had no traffic accidents in the 
past 5 years, had never had a serious traffic accident, 
and had driven regularly within recent years. The 
control Ss were intended to represent relatively safe, 
accident-free drivers. The purpose of the investiga- 
tion was presented to groups of high school and 
college students, hospital employees, housing project 
residents, and to a large teachers’ meeting. The con- 
trol Ss were selected from among those who volun- 
teered and qualified as accident-free drivers, Ss com- 
parable to each of the accident drivers, but different 
from the accident driver in their accident records. 


Interviews 


An interview was conducted to explore the S’s 
driving history and experiences. A detailed descrip- 
tion of the present accident was obtained and care- 
ful inquiry was made into the accident Ss’ percep- 
tions of their driving behavior at the time of the 
accident and other probable factors contributory to 
the accident. 


Tests, Questionnaires, and Attitude Scales 


To determine personality and attitude character- 
istics which might distinguish the accident from the 
safe drivers all Ss were exposed to several objective 
test instruments. Two of these test instruments, the 
Rosenzweig Picture-Frustration Study (P-F Study) 
and the Siebrecht Attitude Scale, have been used 
in other driver investigations (McGuire, 1956; Sie- 
brecht, 1941). A driver rules quiz and a driver at- 
titude checklist were designed specifically for this 
study.? 

The Siebrecht Attitude Scale (1941) consists of 40 
statements about drivers and driving which the Ss 
are asked to rate along a five-point continuum from 
some who agree to some who disagree. The scale has 
been found valid and reliable as a measure of driving 
attitudes when tested in driver-education programs, 
but has been tested to only a limited extent in com- 
parison of accident and accident-free drivers. 

The P-F Study consists of 24 simple line drawings 
depicting situations which are designed to elicit 
verbal expression of anger or frustration (Rosen- 
zweig, Fleming, & Clarke, 1947). The Ss are asked 
to indicate what the picture figures would say or do 
in these situations. The Ss’ patterns of responses 
to these items are presumed to reflect response 
tendencies to: (a) blame oneself, (b) blame others, 
or (c) to minimize provocation to anger in the face 
of frustration. 


Other Sources of Information 


Because of the peculiar menace represented in the 
intoxicated person at the wheel of an automobile 





2 Copies of these instruments have been deposited 
with the American Documentation Institute. Order 
Document No. 8434 from ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress, Washington, D. C. 20540. Remit in ad- 
vance $1.75 for microfilm or $2.50 for photocopies 
and make checks payable to: Chief, Photoduplica- 
tion Service, Library of Congress. 
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TABLE 1 


CURRENT MariTAtL STATUS 














Accident Control 
Status group group 
Married with children 20 27 
Married without children 4 2 
Divorced 4 6 
Separated 3 — 
Widowed 3 4 
Single 16 11 
Total 50 50 





the Ss’ drinking habits were carefully explored. All 
Ss were classified as: (a) alcoholic, (b) possible 
alcoholic, and (c) problem drinker, or (d) without a 
drinking problem.* Observation of the Ss in the 
emergency room and the police record were sources 
of information for evidence that the drivers had 
been drinking shortly before their accidents. 

The accident Ss’ descriptions of the presenting 
accident were compared with reports in hospital 
records and with the Seattle Police Department’s 
records. The Ss’ statements concerning) previous) acci- 
dents and violations were also compared for relia- 
bility with the information recorded in the police 
files. 


RESULTS 
Driver Background 


The age range of the accident Ss were from 
15 to 80 with a predominance of young males 
and middle-aged females among the Ss. There 
were 18 women and 32 men in the accident 
group. If the male and female Ss are grouped 
according to ages above and below 35, there 
are more older women and younger men and 
this sex difference in ages of the accident Ss is 
significant (x? = 4.76; significant .05 level of 
confidence). 

The accident drivers were paired with the 
“safe” drivers according to sex, approximate 
age, race, and education level but the two 
groups of Ss were also comparable in most 
other socioeconomic characteristics. Current 
marital status as a reflection of their similarity 
is tabulated in Table 1. 

In types of living arrangements and em- 
ployment the two groups are also closely 
comparable. Most of the single Ss were stu- 


8 Criteria for this classification were provided by 
Joan K,. Jackson, a sociologist with extensive. re- 
search experience in alcoholism. 
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dents (high school or university) and lived 
with their parents. Widely varied, but com- 
parable occupations were also represented in 
both groups; examples: clerk, teacher, laborer, 
mechanic, policeman, café operator, United 
States marshal, secretary, machinist, watch- 
man, etc. 


Driving Experience 


The accident Ss and their controls were 
similar in that the majority in both groups 
were taught to drive by their parents (almost 
always the father) or by friends. Fourteen in 
the accident group as compared to 11 in the 
safe-driver group were self-taught. Five in 
each group had had driver-training courses. 
The Ss in both groups were similar in the age 
at which they learned to drive (mean age: 
accident group 19.1 years, control group 18.9 
years). The accident Ss had been driving 
from 8 to 40 years in range, the control group 
8 to 50 years in range. Very few drivers were 
able to accurately estimate miles driven each 
year but the accident and safe driver ap- 
peared to have equal yearly experience. 

The type and extent of automobile insur- 
ance coverage was the same in both groups. 
Ten accident and 8 control drivers had no 
insurance coverage in spite of Washington’s 
financial responsibility law that requires 
drivers to possess a liability insurance policy 
or guarantee payment if liable in any accident 
involving more than two hundred dollars dam- 
age or personal injury. 

Important differences between the accident 
and control drivers were as follows: (a) 11 of 
the accident drivers as compared to 2 of the 
control drivers admitted to failing driving- 
skill tests one or more times; (0) drivers’ 
licenses had been revoked at one time for 6 of 
the accident drivers but for none of the con- 
trol; (c) 29 accident drivers as compared to 
17 control drivers admitted citation for more 
than two traffic violations in their driving ex- 
perience; (d) 11 accident drivers as compared 
to 1 control driver had been cited for negli- 
gent, reckless, or drunk driving in their driv- 
ing histories. To the extent that police records 
were available, the number of violations re- 
ported by both groups is reliable. Thus the 
omens in careless or incompetent driving pat- 
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terns for the now injured drivers had been 
gathering for some time. 


Comparison by Test Questionnaires and 
Attitude Scales 


The objective instruments failed to demon- 
strate differences between the accident and 
safe drivers. Performance on the traffic rules 
quiz was similar for both groups. From this 
index of their knowledge of traffic rules and 
regulations the safe drivers are not better 
informed than the accident drivers about 
proper driving conduct. On a nine-point scale 
ranging from very poor to expert all of the Ss 
rated themselves much closer to the expert 
than to the poor driver area of the continuum. 
The mean scores for the two groups on this 
self-rating driving performance measure were 
almost identical. 

From an initial analysis of mean differences 
in the eight response categories on the P-F_ 
Study, the accident drivers appeared signifi- 
cantly different from the control Ss in giving 
fewer intrapunitive or self-blaming responses. 
P-F data had been gathered from another 
group of Ss who had been hospitalized follow- 
ing suicide attempts. When mean differences 
between the accident Ss, their controls, and 
the suicide Ss’ P-F responses were tested by 
the Duncan’s multiple-range test with Kram- 
er’s correction method for unequal numbers 
of Ss, no significant intergroup differences 
were revealed (see Preston, 1964 for a discus- 
sion of the implications of this evidence). 


ANALYSIS OF THE PRESENTING ACCIDENT 


Setting of the Accident 


The accidents sustained by the accident 
drivers occurred sporadically throughout the 
6-month period, with more occurring on Sat- 
urday, Sunday, and Monday than other days 
of the week. There was no concentration of 
accidents for any particular period of the day. 
General driving conditions may be described 
as follows: these accidents occurred in light 
traffic in 27 cases, in medium traffic in 13 
cases, and in heavy traffic in only 10 cases. 
Nine of the 50 accidents occurred on wet 
streets; all others occurred under optimal 
weather and street conditions. There seemed 
to be no particular road arrangement related 
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to a preponderance of the accidents; the 
variety of conditions included intersections, 
major highways and avenues, gravel roads, 
multilaned or single lane, on curved or 
straight roads or streets, inside or outside the 
city limits. 

The types of accidents that predominate in 
this study are classified as “hit fixed objects,” 
“overturned on roadway.” These types oc- 
curred in a much higher proportion than 
appear in the statistics for all automobile 
accidents in Seattle. Twenty-two of the 50 
accidents were of this type and are considered 
more serious by crash-injury experts as po- 
tential sources of injury and fatality. 





Subjects’ Description of the Accident 


Each driver was asked, “What would you 
say was actually responsible for the acci- 
dent?” Fifteen of these Ss admitted responsi- 
bility for the accidents, describing themselves 
as “careless,” “preoccupied,” “tired,” “speed- 
ing,” etc., or as having made driving errors. 
Five acknowledged partial responsibility. 
Thirty blamed other drivers and conditions 
beyond their control or claimed inability to 
fix this responsibility. According to police 
reports, however, 34 of these drivers were 
responsible for accidents; other drivers or 
external factors for nine accidents and no 
responsibility was fixed for seven accidents. 
Thus, there is a considerable discrepancy be- 
tween the Ss’ and the officials’ evaluations of 
the responsibility for these accidents. 

The drivers were also asked to estimate 
their driving competence, that is, skill, abil- 
ity, and alertness at the time of the accident. 
Sixteen of the 50 accident drivers admitted 
to less than usual driving efficiency; 2 claimed 
“not to know” although one -of these was 
reported to have been engaged in a drag race 
contest at the time of the accident and the 
other was unconscious from an assailant’s 
blow to her head. Thirty-two of these Ss 
claimed that their driving was “normal,” 
“usual,” “good,” “100%,” and “extra good.” 
Seventeen of these drivers who claimed to be 
driving efficiently were cited by the police for 
negligent driving in connection with their ac- 
cidents and one with failure to yield the right 
of way. 
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Alcohol and the Accident Drivers 


When all Ss were classified according to the 
alcoholism scale statistically significant dif- 
ferences between the accident and control 
drivers do not emerge. The majority of Ss gave 
no evidence of having drinking problems. 

The nature of the injuries, locations of the 
accidents (outside the city), retroactive am- 
nesia, and tendency of the Ss to minimize 
degrees of intoxication precluded a reliable as- 
sessment of the role of alcohol in the present- 
ing accident. Only rarely did the police com- 
ment in their reports on evidence of intoxi- 
cation. In spite of many sources of error in 
estimating the immediate influence of alcohol 
on these accident drivers this influence looms 
as important since 21 of these accident driv- 
ers admitted to alcoholic intake in various 
amounts from $ to 8 hours prior to the acci- 
dent. None of the drivers admitted to having 
taken tranquilizers, sedatives, narcotics, or 
dextro amphetamines prior to the accident 
but the Ss were also not systematically 
assessed for possible ingestion of these drugs. 


Consequences of the Accidents 


The accidents studied resulted in many 
serious injuries and three passenger fatalities. 
Injuries most often involved the head and 
face but fractured extremities were also com- 
mon. The most frequent types of injuries were 
abrasions, contusions, lacerations, but concus- 
sions and fractures occurred often. Multiple 
injuries, such as concussions with fractured 
extremities or pelvis, were the rule. The degree 
of injury varied from severe paraplegia at the” 
cerval level to an absence of any objective 
evidence of injury or illness such as the “whip- 
lash” injury. The types of accidents sustained 
by these Ss seemed to be quite common to 
traffic accidents generally. 

Property damage was extensive as the con- 
sequences of these accidents. Twenty-eight of 
the 50 drivers had their cars totally wrecked 
and 6 of the other drivers’ cars were totally 
wrecked also. 

The legal and financial consequences of the 
accidents were often severe. Many of the 
drivers faced court proceedings, study. by the 
police, and interrogation by insurance agents 
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TABLE 2 
CHARGES AGAINST SUBJECTS FROM 
PRESENTING ACCIDENT 








Number 

Reckless driving* 3 
Negligent driving 15 
Failure to yield right of way° 3 
No arrest 10 
No record 18 
Charge pending 1 

Total 50 

a Section 45. 


b Section 46. 
© Sections 66 and 68, of the Washington State Traffic Code 
Book, 


and/or lawyers. Charges brought against the 
accident drivers for the accidents are shown 
in Table 2. 

The figures in Table 2 reveal that serious 
violations are strikingly preponderant. In 18 
cases there was no report in the Seattle Police 
Department records of actual citation but 
these Ss may still face citations outside the 
city’s jurisdiction. There is no way, of course, 
of knowing the ultimate magnitude and con- 
sequences of civil action brought against the 
accident drivers. 
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The ability of human observers to make magnitude estimates of damage was 
investigated under 3 instructional definitions of damage: (a) amount of volume 
reduction, or (b) amount of surface distortion or “complexity,” or (c) overall 
amount of damage. The stimuli were distorted metal containers photographed 
from 5 angles of view between 0° and 90°. 9 college students were assigned to 
each of the 15 experimental groups. Analyses of errors in rating indicated that 
at least 2 subjectively different underlying damage scales could be discriminated 
but that these were highly correlated. Shape of the original, intact object is an 
important factor determining the magnitude and direction of errors. Interrater 


reliabilities of about .72 were obtained. 


In assessing degree of damage or distortion, 
a complex, multidimensional set of stimuli 
provide the basis for estimation. It is not 
known whether discriminably different di- 
mensions of damage exist, or, if they do exist, 
whether they lead to different scales of 
damage. The purpose of the present experi- 
ment was to examine numerical ratings of 
damage under three conditions of instruction. 

Two previous studies (Been, Braunstein & 
Piazza, 1964; Pearson, 1962) were concerned 
with estimates of volume reduction (VR) of 
damaged objects. Apart from its particular 
relevance to the field of aviation accident 
investigation, the VR judgment has the 
special virtue that a well-defined physical 
measure is available for comparison. In the 
oresent study, two additional aspects of dam- 
age were defined: surface complexity (SC), 
he degree to which the surface of an object 
uppears wrinkled, crumpled, or distorted; and 
umount of damage (AD), a general, unmodi- 
ied concept determined by whatever con- 
10tations the word “damage” may have to 
1 particular observer. 

Damage ratings from photographs poses the 
pecial problem of determining whether a 
wo-dimensional representation of the three- 


1 This study is part of a program of research sup- 
orted by United States Public Health Service Re- 
earch Grant AC-00003-04, from the Division of Ac- 
ident Prevention. Additional support for this study 
vas provided by the United States Army Transporta- 
ion Research Command under contracts DA 44-177- 
\MC-888(T) and DA 44-177-AMC-116(T). 
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dimensional object can convey as much in- 
formation about the object as direct viewing 
of the object itself. Photographs from dif- 
ferent angles of view might be expected to 
yield different results in terms of magnitude 
or variability of the estimates. In order to 
test this possibility, single photographs from 
five different angles were used. 


METHOD 


Experimental design. The experiment was a 3 X 5 
factorial design with two additional within-groups 
variables. Nine subjects (Ss) were assigned to each 
of the 15 separate experimental groups. The three 
levels of instructions required the Ss to rate single 
photographs of damaged objects with respect to 
VR, or AD, or SC. The photographs of the objects 
were taken at five angles of view: O degrees (per- 
pendicular to the longitudinal axis of the object), 
30 degrees, 45 degrees, 60 degrees, or 90 degrees 
(parallel to the longitudinal axis). Two within- 
groups variables, type of object at 4 levels and 
photograph rank number at 10 levels, completed 
the design so that 5,400 separate scores provide the 
raw data for analysis. 

Subjects. The Ss were 135 students from under- 
graduate courses in psychology at Arizona State 
University. The assignment of the Ss to the par- 
ticular experimental conditions upon their reporting 
to the laboratory was done without bias. 

Stimulus materials. A more complete description 
of the stimulus materials may be found elsewhere 
(Pearson, 1962). Briefly, however, the four types of 
objects were thin-walled metal containers. These 
were a cylinder (C) with a base 68 millimeters in 
diameter, a cylindroid (E) with an elliptical base 
of 38X76 millimeters, a square-based (S) con- 
tainer 63 millimeters square, and a rectangular-based 
(R) container with intact dimensions of 52 X 79 
millimeters. 
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Ten containers of each type were subjected to 
controlled stresses (compression, bending, or torsion) 
to produce a test series of diverse appearance. The 
measured volume reduction of the damaged objects 
ranged from approximately 15% to 85% for the 
individual objects of each type. 

Prior to their being photographed, all containers 
were sprayed with aluminum paint to present a 
uniform appearance. A flat white cardboard turn- 
table, upon which the container was placed, was 
used to rotate each stimulus object through the 
five angles at which it was photographed. 

Five test packets were assembled, each containing 
all 40 photographs taken from a given angle. Each 
test packet was composed of four envelopes and 
each envelope contained the 10 photographs of a 
particular type. The photographs were 3 x3 inch 
glossy prints. 

Procedure. The Ss were tested in groups of five 
or less. Upon arriving at the laboratory they were 
seated at a table, then read the instructions by the 
experimenter (£). The three sets of instructions 
differed in that different perceptual aspects of the 
task were emphasized. The AD instructions simply 
referred to amount of damage; the SC instructions 
included the words “wrinkled, crumpled, distorted’; 
and the VR instructions provided a demonstration 
using a paper drinking cup to show that, upon 
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collapse, the volume of the intact object had keen 
decreased. 

The Ss first arranged the 10 photographs of a 
particular type of object into a rank-ordered series. 
Only one packet (one type) at a time was used. 
After all four packets were rank ordered, the $ 
again removed the photographs and assigned a 
numerical value from 0-99 to indicate his rating. 
The 24 different sequences of envelopes for the four 
container types were randomly ordered, and each S$ 
within a particular group received a different 
sequence from any other member of that group. 


RESULTS 
Error Analyses 


For each S, deviation scores were obtained 
by subtracting the actual measured volume 
reduction from the ratings for each of the 
40 separate stimuli. A positive deviation 
represents an overestimation with respect tc 
volume reduction and a negative deviation, an 
error of underestimation. This was done fo 
all subgroups even though the Ss receiving 
the SC and AD instructions were not rating 
volume reduction. If S responds to the stimul: 
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TABLE 2 
ANALYSIS OF VARIANCE OF MEAN SQUARE AND CONSTANT ERRORS 
MS error Constant error 
Source df MS F MS F 
Instructions (1) , 18,809,527.00 10.01* 21,259.73 5.29% 
Angles (A) 4 5,934,471.50 3.16 i SSIS 0.33 
IXA 8 3,732,247.50 1.99 3,991.86 0.99 
Type (T) 3 15,989,932.00 31.57* 50,262.48 163.67* 
xT 6 2,515,267.60 4.97% 1,561.70 5.09% 
Ex A 12 4,286,458.30 8.46% 626.04 2.04 
shox xX A 24 433,505.65 0.86 439.43 1.43 
Photograph (P) 9 39,455,843.00 65.34* 40,450.33 176.40* 
Reet 18 6,374,048.30 10.56* 3,041.87 13.27* 
PSA 36 6,115,328.40 10.13* 2,837.49 12.37* 
Pic A. 72 589,650.96 0.98 492.41 2.15* 
Pp axeL 27 20,144,136.00 52.97* 15,343.69 87.29* 
Pacer <1 54 3,097 ,428.00 8.15* 2,043.19 11.62* 
RX XA. 108 3,120,549.40 8.21* 1,910.44 10.87* 
Peo TX LX A 216 462,862.44 G22 276.54 TEST 
S’s within groups 120 1,879,814.80 4,021.81 
S’s within X T 360 506,499.21 307.09 
S’s within X P 1,080 603,845.80 229.31 
S’s within X T X P 3,240 380,263.55 175.78 
Total 5,399 

*p < 01. 


in a nonspecific way, then the constant and 
variable errors should not be significantly 
different as a function of instructions. 

Mean-square errors. Table 1 presents the 
values of the root-mean-square (RMS) errors, 
by instruction, angle and type of object. The 
magnitudes are greater for the SC and AD 
instructions than for the VR; greater for the 
nonsymmetrical objects (ellipse and rectangle) 
than for the symmetrical objects (cylinder 
and square). An analysis of variance, sum- 
marized in Table 2, shows that the main 
effect of instructions is significant. The main 
effect of type of object is also significant. 
The interactions, T x I and T xX A, are also 
significant, indicating that from one type of 
object to another, differential effects of 
instruction and angle of view may be 
expected. 

The source of variance named photograph 
refers to the rank numbers of the individual 
stimuli within each type of object. These 
were ordered from low to high values of meas- 
ured volume reduction, and hence, the sig- 
nificant effect of this variable shows that 


there is a change in the magnitude of the 
RMS errors as a function of what can be 
thought of as level of damage. Figure 1 illus- 
trates the P x I interaction. It is clear that 
RMS errors increase as level of damage in- 
creases and, in particular, the increase is 
accounted for by the SC and AD instructions. 
Constant errors. Table 1 also presents the 
means of the deviation scores taken with re- 
gard to sign. In general, the values for the 
SC and AD instructions are underestimated 
with respect to those for the VR condition. 
The nonsymmetrical objects are underesti- 
mated in comparison to the symmetrical ones. 
The results of an analysis of variance for 
these data are summarized in Table 2. In this 
analysis, as in the previous one for the RMS 
errors, the main effects of instructions, type 
and photographs are significant. The main 
effect of angle is not significant, nor are its 
interactions with instruction and type. 
Figure 2 shows the nature of the signifi- 
cant P XI interaction effect. The constant 
errors exhibit approximately the same varia- 
tion for the low-damage stimuli, but there 
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ROOT MEAN SQUARE ERROR 
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MEASURED PERCENT VOLUME REDUCTION 


Fic. 1. Relationship of root-mean-square error to percentage of volume reduction. 





MEAN CONSTANT ERROR 
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Fic. 2. Relationship of constant error to percentage of volume reduction. 
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is a very marked departure in the direction 
of underestimation for the SC and AD groups 
at the high levels. 


Correlations among the Damage Scales 


For each of the experimental groups, the 
mean ratings of the nine Ss were computed 
for each of the 40 stimuli. The intercorrela- 
tions of the means for the different angles of 
view were then obtained for each instruction, 
as well as the correlations between mean 
ratings and measured volume reduction. 

Table 3 shows the intercorrelations for the 
means at the different angles of view. In a 
sense, these correlations are almost like reli- 
ability coefficients since each angle of view 
is a representation of the same stimulus 
object. In general, the 0 degrees and 90 de- 
grees means yield the lowest values for all 
three scales; the three intermediate angles are 
uniformly higher with means of .944, .917, 
and .867 for the SC, AD, and VR scales, 
respectively. The correlations between the 
pairs of mean scale values were: SC versus 
AD, .942; SC versus VR, .760; and AD 
versus VR, .841. 

Table 4 presents the latter correlations by 
instructional group and angle of view. It is 
obvious that the VR scale is a better pre- 
dictor of measured volume reduction than 
are the SC and AD scales. The coefficient of 


TABLE 3 


INTERCORRELATIONS AMONG ANGLES OF VIEW 
FOR THE SC, AD, AND VR SCALES 








Angle 
Instruction Angle 0 30 45 60 90 
sc 0 1.000 .866 .864 -770 451 
30 1.000 955 -930 -651 
45 1.000 947 685 
60 1.000 -694 
90 1.000 
AD 0 1.000 -938 -812 .819 544 
30 1.000 -906 -913 -662 
45 1.000 .933 .740 
60 1,000 ta 
90 1.000 
VR 0 1.000 .937 -828 791 .603 
30 1.000 .883 -880 .694 
45 1.000 -889 -652 
60 1.000 -684 
90 1.000 
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TABLE 4 
CORRELATIONS OF SC, AD, AND VR ScALEs 
WITH MEASURED VOLUME REDUCTION 
Instruction 
Angle Se AD VR M 
0 591 .606 744 .647 
30 .606 .699 .812 -706 
45 .644 .136 -856 745 
60 .645 .685 .886 739 
90 al) .614 728 .623 
M -603 .668 805 





determination for the VR correlation of .805 
is .648 while the values are .364 and .446 for 
the SC and AD scales, respectively. Thus, the 
VR scale accounts for about one and one half 
times more of the variance in measured vol- 
ume reduction. Since the reliability of a 
measure sets an upper limit to the correlation 
that that measure would have with another 
variable, it is not surprising that the “valid- 
ity” coefficients at the 0 degrees and 90 de- 
grees angles are lower than those at the 
intermediate angles. 


DISCUSSION 


The rather detailed analyses of the data 
seemed necessary because of the uncertainty 
about the stimulus characteristics in this 
relatively unexplored area of perception. In 
fact, the conclusion that at least two different 
scales of damage exist is reached by pointing 
to differences in the fine structure of the data 
rather than to major variations that are a 
function of the instructional conditions. 

The magnitude of the differences among 
the SC, AD, and VR groups is quite small; 
the reliability coefficients for the separate 
scales are about the same. However, a num- 
ber of the interaction effects, particularly in 
the analysis of the RMS errors, makes it clear 
that the VR instruction elicits different be- 
havior from that under the SC or the AD 
instructions. 

The geometric features of the original, 
intact objects have a substantial influence on 
the ratings. The cylinder is overestimated 
with respect to the actual VR measures, and 
in general produces the greater positive con- 
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stant errors for the SC and AD instructions. 
The nonsymmetrical objects, ellipse and 
rectangle, are underestimated markedly for 
the SC and AD instructions, and generally 
are underestimated with respect to their sym- 
metrical counterparts. Moreover, data pre- 
sented elsewhere (Gregg & Been, 1964) indi- 
cate that the combination of nonsymmetrical 
object with extreme angle of view (0 degrees 
or 90 degrees) produces very low reliability 
coefficients under the VR instructions. 

The overall uniformity of the results might 
best be explained by the obvious fact that 
physical transformations in the objects are 
themselves correlated. It should not be sur- 
prising that the psychological scales are cor- 
related. From a theoretical point of view, 
the next most logical step would be to attempt 
to produce stimuli such that measurable at- 
tributes (volume reduction, number of inter- 
secting lines, area in shadow, number of dis- 
continuous edges, depth of depressions along 
edges, slope of angular departure from normal 
outline, and many others) are correlated in 
greater or lesser degree. To establish the rele- 
vant dimensional cues for damage perception 
appears to be a formidable task. 

From a_ practical standpoint, however, 
equivalence of the damage scales can be as- 
sumed, although further evidence is needed 
for this assumption. For the present, any of 
the scales would seem to qualify as appropri- 
ate research tools since the reliabilities, except 
for extreme angles of view, are sufficiently 
high and essentially the same. 

The methodological changes introduced in 
this experiment produced reliability coeffi- 


LEE W. GREGG AND RICHARD T. BEEN 


cients using single photographs much greater 
than those obtained in the study by Pearson 
(1962), even though two or more photo- 
graphs of the same object were available to 
the Ss in that study. In the present study, the 
Ss were required to rank order the photo- 
graphs of the damaged objects with all 10 
stimuli of a given type available to them at 
the same time. Numerical values were as- 
signed at a later point in the session after all 
stimuli had been viewed by the S. 

Compared with the reliability of .80 re- 
ported for direct estimates of volume reduc- 
tion (Been, et al., 1964), the present results 
are encouraging. The average value for the 
VR instruction, is .684 (Gregg & Been, 1964). 
If the extreme angles of view are excluded, 
the mean becomes .757. It would appear, 
therefore, that appropriate photographic tech- 
niques can make it possible to approximate 
direct observation with little loss in precision. 
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SALARY AS A PREDICTOR OF SALARY: 
A 20-YEAR STUDY 


MARSHALL H. BRENNER anp HOWARD C, LOCKWOOD 
Lockheed-California Company, Burbank, California 


This study investigated the value of using salary after a few yrs. in an organi- 
zation as a predictor of, and therefore as an intermediate criterion for, salary 
at a later date. Salary data were collected for each yr. of the 20-yr. careers of 52 
aircraft engineers. The salaries were combined to yield yearly distributions, 1 
for beginning salary and 1 for each yr. of experience. The resulting distributions 
were intercorrelated and the following results obtained: (a) 92% of the inter- 
correlations were significant at the .01 level, (b) correlations between equidistant 
yrs. became larger as tenure increased, and (c) the variance of salaries increased 
with increasing tenure. It is concluded that salary early in a man’s career can 


be used as an intermediate criterion. 


In his discussion of the criterion problem, 
Thorndike (1949, pp. 120-124) differentiates 
between ultimate, intermediate, and immedi- 
ate criterion measures. He points out that the 
ultimate criterion is very seldom used in 
psychological research because it is either 
unobtainable or else it takes so long to 
mature that it loses much of its utility. 

For example, the ultimate criterion of suc- 
cess for the selection of managerial personnel 
in industry would involve their total con- 
tribution to the organization for the entire 
period they are in the organization. This 
criterion is subject to both of the problems 
mentioned above. 

Because the ultimate criterion is ro rarely 
usable, intermediate and immediate criterion 
measures are sought. The differentiation be- 
tween these two is one of degree, and where 
a criterion stops being immediate and be- 
comes intermediate is a matter of judgment. 
In the same way, there are differing degrees 
of intermediateness, that is, the criterion may 
be available at many stages in a man’s 
career, Salary is such a criterion. The salary 
a man attains during his career with an 
organization is frequently utilized as a meas- 
ure of his value and/or contribution to that 
organization. And, inasmuch as it is a cumu- 
ative measure that usually reflects past as 
well as present accomplishment, it frequently 
somes closer to being an objective measure 
yf the ultimate criterion than any other single 
neasure. 
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However, salary obtained at a point well 
into a man’s career involves one of the two 
problems mentioned earlier; there is neces- 
sarily an appreciably long wait for the data 
to mature. If, however, salary at a later date 
in a man’s career is predictable from data 
gathered at an earlier period of time in his 
work life, the earlier data could be used 
as an intermediate criterion for personnel 
research purposes. 

The question that now arises is: What data 
might be used to predict the salary a man 
will receive at a date well along in his career? 

If we assume that (a) the relative ability 
of a group of men is consistent, (6) these 
abilities are evaluated consistently by the 
men’s superiors, and (c) salary increases are 
related to these evaluations, then the men 
who receive favorable evaluations should 
regularly receive more and/or larger salary 
raises. With the exception then, of an initial 
period of adjustment to the organization, 
there should be a relatively high correlation 
between salaries of different years, and es- 
pecially for those years that are relatively 
close together. 

If these assumptions are true, and if salary 
position after a few years correlates highly 
with salary after many more years, the former 
could be used as a predictor of (or as an 
intermediate criterion for) the latter. 

Also, if the three above assumptions are 
true, the correlations between years should 
increase as tenure with the company in- 
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creases. In other words, the correlation be- 
tween salary after 15 years and salary after 
16 years should be higher than the correla- 
tion between salary after 10 and 11 years, 
which in turn should be greater than the 
correlation between salaries after the fifth and 
sixth years, etc. And this should be true not 
only of adjacent years, but of separated years 
as well. For example, the correlation between 
the twelfth and fifteenth years should be 
greater than that between the eighth and 
eleventh years, etc. 

It is, of course, not possible to determine 
the extent to which these three assumptions 
are actually true in any situation, but there 
is some evidence that they were true to an 
appreciable degree in the company in which 
this study was done. The immediate superior 
has always had the major responsibility for 
determining salary actions for his subordi- 
nates, and the company has, since these men 
were hired, continuously had a formal system 
for the evaluation of the performance of 
personnel by their immediate superior. 

This study attempted to find answers, then, 
to the following questions: 
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1. Does salary position after a few years 
with this company predict later salary to a 
highly significant degree? To a high enough 
degree to allow its use as an intermediate 
criterion? 

2. Do correlations between equally distant 
years increase as tenure with the company 
increases? 


3. Does the variability of salaries for the 
same group of men increase with tenure? 


Mrrrop 


A listing was obtained of all the aircraft engineers 
with degrees who were hired in 1939, 1940, and 1941 
and who were still employed by the Lockheed- 
California Company, a large airframe manufacturing 
company, at the end of 1960, 1961, and 1962, 
respectively. Salaries (or, occasionally, wages) were 


1 There was no union among engineers at the 
time the subjects joined the company, but a volun- 
tary union was formed a few years later, Some of 
the subjects did belong to this union for part of 
the time period covered; and it is possible that 
membership in the union had a stabilizing effect 
on some of the obtained relationships. However, 
since the bargaining agreement has not restricted 
the ability of the company to promote personnel 
on the basis of performance or to give merit raises 


TABLE 1 


SALARY INTERCORRELATIONS FOR 52 DiGremp ENGINEDRS 


41 12 13 14 15 16 17 18 19 20 20S 22neze 





1 2 3 4 oS 6 7 8 oo 
ns eee 
Starting salary 
1. 0 mos 73 62 60 52 53 58 55 56 53 50 41 40 34 29 26 25 22. 23. 22 s19aaG 15: 
Salary after 

2. 6 mos 73 8449 58 63 62 51 43-54 43° 40 39 36 37 38 134-33 St SoZ yee 
3. 14 yrs 62 84 62 65 78 75 62 57 68 58 60 57 53 50 51 48 46 43 45 39 36 33 
4, 2h yrs 60 49 62 71 66. 70 64 62 61 86 56 52 46 45 44 42 39 37 39 34 31 80 
5. 34 yrs 52) 58 Gon wi 82 82 73 67 70 66 67 71 59 61 60 56 54°53 55" 50 a7 ca? 
6. 44 yrs 53° '63° "78" 6607°"82Z 82 73 69 72 67 68 67 53 53 52 49 48 45 47 43 40 39 
7. Sh yrs 58) 625.75 701) 82) 82 85 82 81 76 82 82 71 69 68 64 61 59 60 53 49 47 
8. 64 yrs 55 S51 62. 64 73: 73> 85 9% 89 87 83 83 74 68 66- 62 60 59 60 54 50 49 
9, 7 yrs 50: Ad. 50. 62. <0) Oo) toa se 86 90 88 86 76 72 71 68 65 64 64 59 55 54 
10. 84 yrs $354 68! 61°70 72 = (3i> 89 86 88 88 84 78 72 68 64 62 61 63 58 53 50 
11. 94 yrs 50 43 58 56 66 67 76 87 90 88 90 89 83 80 78 75 72 71 72 69 65 65 
12. 104 rs 41 40 60 56 67 68 82 83 88 88 90 95 90 86 84 81 79 77 78 73 68 67 
13, 114 yrs 40 39 57 52 71 67 82 83 86 84 89 95 94 92 90 87 86 85 85 80 74 73 
14, 124 yrs 34 36 53 46 59 53 71 74 76 78 83 90 94 96 95 92 90 89 88 84 79 77 
15, 134 yrs 29 37 50 45 61 53 69 68 72 72 80 86 92 96 98 97 95 94 94 90 86 85 
16. 144 yrs 26 38 51 44 60 52 68 66 71 68 78 84 90 95 98 98 97 96 95 92 88 87 
17. 154% yrs 25 34 48 42 56 49 64 62 68 64 75 81 87 92 97 98 99 98 97 95 91 90 
18, 164 yrs 22 33. 46 39 54 48 61°60 65 62 72 79 86 90.9597 99 99 98 96 92 91 
19, 174 yrs 93 31° 43. 37. 53 45459, 59. 64 61-71. 17 seo eee aeeOlvon oy 99 97 94 93 
20. 184 yrs 22 33 45 39 55 47 60 60 64 63 72 78 85 88 94 95 97 98 99 98 95 93 
21. 194 yrs 19 27 39 34 50 43 53 54 59 58 69 73 80 84 90 92 95 96 97 98 99 98 
22, 204 yrs 18 25 36 31 47 40 49 50 55 53 65 68 74 79 86 88 91 92 94 95 99 99 
23. 214 yrs 15 21 33 30 47 39 47 49 54 50 65 67 73 77 85 87 90 91 93 93 98 99 
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SALARY AS A PREDICTOR 


obtained for each of the above men as of December 
31 of each year since hire. In a number of cases, 
data were found to be missing. The reasons for this 
included such things as: unavailable records, wartime 
military service, transfers to other divisions, etc. 
All of the men who had, at any time, left the 
employ of the company were eliminated from the 
study. Also, if salary for a relatively large number 
of years was unobtainable, the person was elimi- 
nated. The final data consisted of records of 52 men 
on whom adequate data were available. For 90% 
of this sample (47 of the 52), a minimum of 18 
years’ data was obtainable. 

In order to base the correlations on the largest 
possible number of cases, it was desirable to combine 
the data from all 3 years into one group. But 
because the mean starting salary differed from year 
to year, the data from separate years were not 
directly comparable. In order to obtain comparable 
measures, each salary was converted to a standard 
score, a measure that indicates that score’s position 
relative to the distribution in which it occurs. 
(The standard scores obtained in this study, then, 
represent the position of each salary relative to 
the other salaries received that year for those 
members of the sample who were hired in the 
same year.) 

In those cases where data could not be found, 
the mean standard score of 0.00 was inserted be- 
cause the computer program used to analyze the 
data does not allow for unequal n’s. This procedure 
would tend to lower the obtained correlations. The 
results, therefore, are likely to be wundestimates 
rather than overestimates of the true relationships. 
Also, because most of the missing data were for 
the earlier years of the study, the correlations for 
those years are most affected. 

These standard scores from the 3 separate years 
. were then combined into one distribution for salary 
at the end of the calendar year in which employ- 
ment began, and one for salary at the end of each 
succeeding calendar year. These are labeled in Table 1 
as starting salary, salary after + year, salary after 
14 years, etc. (These labels are somewhat inaccurate 
because data were collected for all members of the 
sample as of December 31, regardless of their hire-in 
month. Most members of the sample were hired in 
June or July, but a few were employed in either 
the first or last few months of the year; the 4 year 
is, in each case therefore, an average figure rather 
than the exact one.) These 23 distributions were 
intercorrelated. 


RESULTS 


An examination of Tables 1 and 2 reveals 
the following: 

The trend of correlations is extremely 
consistent, with the best prediction being in 


for above-average performance, this would not ap- 
pear to have a significant effect on the results and 
conclusions of the study. 
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TABLE 2 


STANDARD DEVIATIONS OF THE SALARY DisTRIBUTIONS; 
SALARY GIVEN IN DOLLARS PER WEEK 








Year hired 








Years of 1939 1940 1941 
seniority (V = 17) (W=20) (N = 15) 
None 7.24 5.65 5.48 

4 8.39 6.11 3.63 
13 9.02 6.10 10.22 
23 11.46 13.63 Ge 
34 16.15 10.95 6.75 
4h 13.08 9.03 9.80 
53 18.48 11.81 11.60 

3 18.05 11.84 14.81 
7k 15.96 13.48 18.70 
8i 17.78 13.13 20.25 
94 18.27 30.70 22.55 

104 24.33 17.98 26.96 
113 26.16 19.94 Ol23 
123 31.10 21.48 34.43 
134 Sais 26.05 33.08 
143 41.79 27.00 30.46 
153 43.18 27.69 41.53 
164 44.49 30.58 44.00 
174 49.37 34.47 47.70 
184 54.03 37.65 47.41 
191 53.45 47.49 51.05 
20% 53.53 56.33 54.46 
213 57.72 58.40 60.36 





adjacent years and the relationship becoming 
poorer as the time span increases. 

Of the total of 253 correlations in Table 1, 
243 (96%) are significant at the .05 level 
and 232 (92%) are significant at the .01 
level. Correlations of .27 and .35, respectively,. 
are required for significance at these levels. 

The correlations in Table 1 increase in 
size along the diagonal from the upper left 
to the lower right. This is true for all such 
diagonal rows. 

The variance of salaries 
increasing years of experience. 


increases with 


DISCUSSION 


With respect to the first question asked 
at the beginning of this study, the results 
obtained indicate that, for the relatively 
homogeneous sample employed in this study, 
salary at one period of time is a very signifi- 
cant predictor of salary at a later date, es- 
pecially when the group has been employed 
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for a few years. And the correlations are 
large enough to justify the use of salary after 
a few years of employment as an intermediate 
criterion for salary after a greater number 
of years. 

Affirmative answers were obtained for the 
second as well as the third question posed 
at the beginning of the study; the correlations 
between equally distant years do increase 
with tenure, as does the variability of the 
salaries received. 

It is, of course, possible that this increasing 
variance affected these correlations and re- 
sulted in the higher correlations for the later 
years. However, the transformation to stand- 
ard scores, which has the effect of equating 
the standard deviations of all the separate 
distributions, was done to minimize any 
such effect. 

The results obtained lead to the conclusion 
that the behaviors and/or personal character- 
istics that are being rewarded are being 
rewarded consistently. That is, people who 
get the largest monetary rewards early in 
their career also tend to get the largest 
monetary rewards later. Some changes occur 
in relative monetary position, but these occur 
rather slowly and infrequently. 

Although it is rewarding to find a usable 
intermediate criterion, this conclusion also has 
a disturbing implication. It raises the question 
of why the prediction (or reliability) is so 
good. Is a man’s later salary dependent on 
his reputation? A reputation built on chance 
factors early in his career and on resulting 
early salary action? If so, this would mean 
that salary differences later in men’s careers 
do not reflect lasting characteristics and dif- 
ferences and are therefore relatively meaning- 
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less. However, in terms of both common sense 
and present knowledge of individual differ- 
ences and the stability of interests and abili- 
ties, this does not seem to be a reasonable 
explanation. So there must be some abilities, 
interests, personality characteristics, or com- 
binations of these, that are responsible for 
some men being consistently evaluated as 
better than other of their peers. But if this 
is true, why have personnel psychologists not 
been able to measure these characteristics 
better? Validities have generally been much 
lower than the correlations obtained here, 
indicating that much reliable variance remains 
unexplained. 

One other implication would seem to arise 
as well. In situations where salary is a rele- 
vant criterion of performance, there might 
be value in using it as one evaluatory measure 
in promotional and placement decisions. For 
example, some measure of salary progress 
over the most recent 5-year period of experi- 
ence could be an excellent predictor of per- 
formance over the next 5 years. 

The results of this study cannot be too 
widely generalized because the sample was 
quite homogeneous. All of the subjects (Ss) 
were degreed engineers with long careers in 
this company, and all were hired at the same 
time and went through the expanding phase 
of the organization. Any conclusions drawn 
from this study should be applied with care 
to other situations, taking the above factors 
into account. 
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SUCCESSFUL POLICEMEN AND FIREMEN APPLICANTS: 


THEN AND NOW 


RICHARD W. JOHNSON 
University of North Dakota 


Improved results in the selection of policemen and firemen applicants are dis- 
cussed from an historical point of view. A comparison is made between the 
successful candidates selected by Terman in 1916 and the successful candidates 
selected in a study conducted by Matarazzo et al. in 1959-1962. The acceptable 
applicant today is drawn from a better educated and more highly intelligent 
segment of society than was his counterpart nearly 50 years ago. He is also 
more carefully evaluated for personal and emotional adjustment problems. Con- 
siderable improvement appears to have been made in the selection of acceptable 
candidates for 2 highly critical occupations. 


The publication of a recent article in the 
Journal of Applied Psychology on the charac- 
teristics of successful policemen and firemen 
by Matarazzo, Allen, Saslow, and Wiens 
(1964) calls to mind the first research article 
ever to appear in the journal. In this early 
article, Terman (1917) described the use of 
“mental and pedagogical tests” in the selec- 
tion of policemen and firemen. The similarity 
of the two articles, separated by so long a pe- 
riod of time, provides a unique opportunity 
for evaluating the progress which has been 
made in the selection of qualified personnel 
by means of psychological tests. 

Terman’s study represents personnel selec- 
tion procedures at their finest just prior to 
World War I. In fact, he believed himself to 
be the first one to use psychological tests for 
the selection of policemen and firemen appli- 
cants. Matarazzo et al., on the other hand, 
believed their selection procedures to be fairly 
typical of procedures followed in major cities 
in the United States today. Any improvement 
in the results of personnel selection programs 
found in the Matarazzo study, as compared 
with the Terman study, seemingly, could be 
generalized to include many of the other mod- 
ern selection programs for policemen and fire- 
men throughout the United States. 

Upon the request of the city manager, Ter- 
man had his graduate students administer the 
Stanford Revision of the Binet-Simon Intelli- 
gence Scale, just shortly after its original pub- 
lication in 1916, and several achievement tests 
to all the applicants for policemen and fire- 
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men positions in the city of San Jose, Cali- 
fornia. These tests were used, in addition to 
a medical examination, tests of physical 
strength and agility, and an interview, to se- 
lect successful candidates. 

In all, Terman tested 30 applicants for the 
two jobs. They ranged in age from 21 to 38 
with a median age of 30. The median amount 
of formal education for the applicants was 
either the sixth or the seventh grade (Terman 
did not provide the exact figures for this vari- 
able). Their median IQ on the Stanford-Binet 
(abbreviated version) was 84. 

Upon the basis of his normative studies, 
Terman recommended that an IQ of 80 be 
used as a cutoff point in selecting eligible 
candidates. The application of this step auto- 
matically eliminated 10 of the 30 candidates, 
including 4 who had been serving as “extras” 
upon the police force. The median IQ for the 
group of candidates who remained was 89, 
which is equivalent to a percentile rank of 17 
(Terman, 1916, p. 78, see Table 1). 

The median amount of formal schooling 
completed was also increased. Nine of the 
original 30 subjects in Terman’s sample had 
completed the eighth grade; 5 had not com- 
pleted the sixth grade. The remaining 16 sub- 
jects (Ss) completed either the sixth or the 
seventh grade. In the narrative portion of his 
study, Terman indicated that some of the 
applicants with little education were elimi- 
nated on the basis of their IQ score. Judging 
from this, it may be concluded that the me- 
dian amount of formal schooling for - the 
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TABLE 1 


INTELLECTUAL ABILITY AND EDUCATIONAL ATTAINMENT 
OF SUCCESSFUL POLICEMEN AND FIREMEN 
APpPpLlicANTs: 1916 versus 1959-1962 








Matarazzo et 





Terman: 1916 al.: 1959-62 
Variable (N = 20) (N = 243) 
Intellectual ability 
Median IQ 89 113 
Median percentile 17 80 


Verbal description Dull normal Bright normal 


Educational attainment 


Median years of formal 

education 

National average (est.) 8.38 

Verbal description Slightly below 
average 


7 (est.) 12 
12.3> 
Average 





a Based upon median amount of formal schooling completed 
by males, ages 50-54, in 1940 (U. S. Bureau of the Census, 1960, 
p. 214). Information upon the median amount of formal school- 
ing completed by the U. S. populace was not obtained prior to 
the 1940 census. 

b Based upon median amount of formal schooling completed 
by males, ages 25-29, in 1960 (U. S. Bureau of the Census, 1964, 
p. 1-405). 


acceptable applicant was probably at the 
seventh grade level. 

Matarazzo et al. (1964) evaluated 243 
candidates who had passed a Civil Service 
written examination (including the Wonderlic 
Personnel Test), medical examination, and a 
departmental and/or Civil Service oral inter- 
view for a position as a policeman or fireman 
in Portland, Oregon. The characteristics of 
these successful applicants were more care- 
fully studied by means of an 8-hour psy- 
chological examination which included the 
Wechsler Adult Intelligence Scale (WAIS), 
several standardized personality and interest 
inventories, projective tests, and an interview. 

The authors found the successful candi- 
dates to be “superior young men.” A median 
WAIS IQ of 113, which represents the 80th 
percentile rank (Wechsler, 1958, p. 43), was 
recorded. The median amount of education 
completed was high school graduation. 

Perhaps the most striking difference be- 
tween the two studies is in terms of the intel- 
lectual ability of the eligible applicants. In 
1916, the typical applicant judged to be 
eligible for appointment as a policeman or 
fireman possessed an IQ in the “dull normal” 
range of intelligence compared with a norm 
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group appropriate for his time. In 1959-1962, 
the successful applicant for these positions 
received an IQ score in the “bright normal” 
range of general intelligence as compared with 
an up-to-date norm group. This difference re- 
flects a marked improvement in selecting more 
able personnel for two highly significant oc- 
cupations from the viewpoint of society’s wel- 
fare. 

The average educational attainment of the 
successful firemen and policemen applicants 
is relatively increased. Table 1 shows that the 
successful applicant in Terman’s study was 
slightly below the national average in the 
number of years of formal schooling com- 
pleted. The successful applicant in Mataraz- 
zo’s study, on the other hand, was very close 
to the national average in the amount of for- 
mal education received. 

A very important difference between the 
two time periods has been the introduction of 
standardized procedures for personality as- 
sessment in a number of cities. Matarazzo’s 
study suggests that today’s fireman or police- 
man is an emotionally well-adjusted indi- 
vidual. Terman stressed the importance of 
moral integrity and personal qualities, but the 
evaluation of candidates for these qualities 
was done solely by means of an interview, 
apparently conducted by nonpsychologists. 

Terman also advocated the screening out 
of those with low social intelligence, or “‘so- 
cial feeblemindedness,” as he termed it. He 
did not, however, propose a test to evaluate 
this characteristic. Despite the desirability of 
evaluating candidates. along this dimension, 
the procedures for so doing have not greatly 
improved since Terman’s time (Guilford, 
1959). 

In summary, the comparison of these two 
studies, covering a period of nearly half a 
century, suggests that considerable improve- 
ment has been made in the selection of ac- 
ceptable policemen and firemen candidates. 
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ARE SVIB INTERESTS CORRELATED WITH DIFFERENTIAL 
ACADEMIC ACHIEVEMENT? * 


RICHARD W. JOHNSON 
University of North Dakota 


Segel’s (1934) finding that SVIB scores correlated more highly with differential 
academic achievement than with absolute academic achievement has been often 
quoted, but little studied. The relationship between SVIB scores and ACT test 
scores for 1875 university freshman males was compared with the relationship 
between SVIB scores and the differences between pairs of ACT tests. The SVIB 
scale scores were more highly correlated with differential achievement than with 
absolute achievement when scholastic aptitude scores were held constant; how- 
ever, the relationship was slight. When only hypothesized relationships were 
considered, no difference was found. The interpretation of SVIB scores as re- 
flecting variations in either absolute academic achievement or differential aca- 
demic achievement should be highly guarded. 


A number of studies have shown that inter- 
est scores on the Strong Vocational Interest 
Blank are not highly correlated with academic 
achievement. Berdie has summarized the lit- 
erature on this topic as follows, 


The consistently low but sometimes significant rela- 
tionships found between SVIB scores and academic 
and training grades lead inevitably to the conclusion 
that there is a slight relationship between interest 
scores and grades, but that the size of this relation- 
ship is such that it serves little use in making predic- 
tions and accounts for little of the variance in 
academic success [Berdie, 1960, p. 42]. 


Nearly all of the studies in this area, 
however, have used an absolute measure of 
achievement as the criterion variable. A num- 
ber of years ago, Segel (1934) explored the 
merits of using a differential measure of aca- 
demic achievement. He correlated SVIB in- 
terest scores with the differences between 
pairs of scores on the Iowa High School Con- 
tent Examination (IHSCE). In so doing, he 
found a number of correlations in the .50s and 
40s. These findings led him to conclude that 
SVIB interests have more power to show the 
differences between achievements than they do 
to show the achievements themselves. 

The difference score, as employed in the 
above situation, serves as a measure of intra- 
individual variability in achievement. Segel 
reasoned that if interests were significantly 
related with differential achievement that they 


1 This article is based upon data collected by the 
author in fulfilling the dissertation requirements for 
the PhD degree at the University of Minnesota. 
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could become important factors in aiding stu- 
dents to choose among different curricula. 

Segel’s findings have attracted the attention 
of a number of experts in the field of interest 
measurement. Strong (1943, p. 527) noted 
that the use of an achievement difference 
score was more appropriate than a single 
achievement score as a criterion measure for 
the SVIB in that the SVIB scores are basically 
difference scores themselves. Later, he sug- 
gested Segel’s method as one of four ap- 
proaches to ‘“‘an ideal procedure for determin- 
ing the relationship between interest and 
achievements [Strong, 1955, p. 155].” 

Cronbach (1960) has described Segel’s 
work as “a finding of great potential impor- 
tance in classification and guidance [p. 428].” 
Super and Crites believe that his findings are 
“suggestive” (Super & Crites, 1961, p. 433). 

Despite its frequent citation in the litera- 
ture, very little research has been undertaken 
to confirm or extend Segel’s finding. Hewer 
(1957) obtained positive results in a study 
somewhat similar to Segel’s; however, her 
research was limited to the use of the SVIB 
Physician key and the relative academic 
achievement of a select sample of premedical 
freshmen.. 


PURPOSE 


The present study investigates the use of 
an achievement difference score as a criterion 
in interest measurement. Essentially, it seeks 
to answer the question suggested by Segel’s 
study, “Are SVIB interest scores more highly 
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correlated with differential academic achieve- 
ment than they are with absolute academic 
achievement?” Differential academic achieve- 
ment refers to the difference score found in 
subtracting the student’s achievement score 
in one academic area from his achievement 
score in a second area. Absolute academic 
achievement indicates the student’s achieve- 
ment in any single area without reference to 
his achievement in any other area. 


METHOD 
Sample 


All University of Minnesota freshmen males en- 
‘rolled in the fall quarter, 1960, for whom SVIB 
profiles, American College Testing (ACT) Program 
achievement scores, Minnesota Scholastic Aptitude 
Test (MSAT) scores, and high school percentile 
‘ranks (HSR) were available were included in the 
study. A sample of 1,875 students, 52.7% of all 
‘freshmen males, was obtained. 


Appraisal Instruments 


The SVIB was used as the measure of interest. 
‘The six scales (Physician, Engineer, Personnel Di- 
rector, Purchasing Agent, Life Insurance Salesman, 
and Lawyer) originally studied by Segel were in- 
cluded in the study. In addition, 19 other scales 
(Psychologist, Dentist, Mathematician, Physicist, 
Production Manager, Printer, Math-Physical Science 
‘Teacher, Social Science High School Teacher, Social 
Worker, Minister, Musician-Performer, CPA Owner, 
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Accountant, Banker, Author-Journalist, President of 
Manufacturing Concern, Interest Maturity, Occupa- 
tional Level, and Masculinity-Femininity) were 
chosen for study. These scales were selected upon 
the basis of (a) low intercorrelations with the other 
interest scales, (b) logical relationships with the 
measures of academic achievement, and/or (c) po- 
tential usefulness as shown by the research findings. 

The ACT, which assesses achievement in the same 
four areas as the IHSCE, viz., English, Mathematics, 
Natural Science, and Social Studies, was chosen as 
the index of achievement. A total of six different 
achievement scores were obtained by subtracting 
each achievement score from each of the other three 
achievement scores. 


Procedure 


Scores on on the 25 SVIB scales were correlated 
with scores on the 4 ACT tests and the 6 achievement 
difference scores by means of the Pearson product- 
moment correlation formula. (Although the SVIB 
scores do not represent exact interval scales, inspec- 
tion of several scatter diagrams between the interest 
and achievement scores showed that a linear rela- 
tionship was approximated.) 

In addition to the zero order correlation coef- 
ficients, the first order partial correlation coefficients 
with MSAT scores held constant were also computed 
between the SVIB scores and both the ACT scores 
and the ACT difference scores. The MSAT scores 
were held constant to help control for the effects of 
scholastic aptitude known to operate upon both 
variables under consideration (American College 
Testing Program, 1960; Strong, 1943). 


TABLE 1 


ZERO ORDER CORRELATION COEFFICIENTS BETWEEN SVIB ScaAte Scores AND IHSCE Susrest 
Scores AND IHSCE DrrrerEence Scores (NV = 100) 

















SVIB Scale 











Achievement scale Life In- _ Personnel 
Engi- surance Manage- =‘ Purchas- 
neering Medicine Salesman ment ing Agent 
THSCE Subtest 
English Literature —.10 —.11 — .04 .10 — 43 
Mathematics 49 28 —.27 —.15 .04. 
Science 36 29 —.29 —.17 —.26 
History and Social Science —,16 —.05 .08 04 —.26 
(HSCE Difference Score® 
English Literature—Mathematics —.55 —.36 tae 23 — 44 
English Literature—Science — 46 —.41 20) 28 —.20 
Mathematics—History and Social 
Science aH 29 —.31 —.17 28 
_Science—History and Social Science aS 34 — 37 —.21 —b 


BN Lee en ee eee 


Note.—Adapted from Segel (1934), pp. 93, 96. 


® Segel’s final analysis was based upon the correlations among only five SVIB scales and only four achievement difference scores. 
fe dropped the SVIB Lawyer scores and the IHSCE English minus Social Studies and Math minus Natural Science difference 
cores from consideration in that each of these scales produced generally insignificant correlations between interests and achievement. 


b Not available, 
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TABLE 2 


CoRRELATION COEFFICIENTS BETWEEN SVIB SCALE SCORES AND ACT Test Scores (N = 1,875) 








English Math Social Studies Natural Science 
Variable MSAT aah 
rie 712.3 r12 112.3 Y12 112.8 rie 112-3 
eee (cece I= a. 
Minnesota Scholastic Aptitude Test .65 .60 67 67 
SVIB Scale 
Psychologist Ey) 22, ~—201 24 .05 31 abil coo 15 
Physician 29 19 .00 .26 10 .20 01 34 .20° 
Dentist 09 .06 .00 elit 08 —.01 —.01 14 a2 
Mathematician 24 19 mi 34 .24 16 00 30 19 
Physicist 18 14 .03 ml .26 10 —.03 27 21 
Engineer 16 14 .05 oo) 33 09 .02 oa ea 
Production Manager —.08  —.06 O1 eld 20 —.07 —.02 03° —.11 
Printer —.02 00 02 .00 02 —.08 —.08 02 —.04 
Math-Physical Science Teacher .09 08 .03 Ls .23 .08 .03 18 16 
Personnel Director .10 .06 05 Oia 06 18 Lo 09 03 
Social Science Teacher 00 O1 01 —.14 —.14 10 10 —.05 —.05 
Social Worker 18 m2 00 —.02 —.16 t 3 10 .03 
Minister 16 allt .02 00 —.12 18 10 10 —.01 
Musician (Performer) 21 16 03 06 —.08 09 —.07 tS 02 
CPA 16 m2 .03 16 08 18 10 08  —.04 
Accountant —.04 —.01 02 10 10 03 037 208 .00 
Purchasing Agent =—,20| =.11 102575. 06 07 =16 —.04 =197) 708 
Banker —.23 —.14 02 —15' "—.02 —18 © =104=— 27 SO 
Life Insurance Salesman —11 —.09 —.03 —.27 —.26 —.05 03 —.21 —.18 
Lawyer 15 09 —.01 —.05 —.17 Sy .09 04. —.08 
Author-Journalist aii ea 00 —.04 —.18 109.03 07) e206 
President Manufacturing Concern —.04 —.04 —.02 .00 03°06 —.06)—.04mi 02 
Interest Maturity ald ell O01 07 —.04 23 16 14 04 
Occupational Level eae 14 01 n2te =O 2S B13 16 .03 
Masculinity-Femininity ==,09 5107) 201 15 26 —.06 .00 .06 16 





Note.—An r > .05 is significant at the .05 level of probability. 


Anr > .06 is significant at the .01 level. 


ary signifies the zero order correlation coefficient between the SVIB scales and the ACT tests. 
b 10.3 signifies the first order partial correlation coefficient between the SVIB scales and the ACT tests with the MSAT scores 


held constant. 


¢ Bold-faced correlation coefficients represent SVIB scales and ACT tests which are hypothesized to be significantly related with 


each other. 


In many cases, the interest scores are not logically 
related with either one of the two types of achieve- 
ment scores. To help control for this fact, logical 
relationships between the SVIB scales and the ACT 
tests were specified (see bold-faced 7’s in Table 2). 
These logical relationships were then extended to in- 
clude the ACT difference scores which contained 
contrasting areas of achievement (see bold-faced r’s 
in Table 3). 

The selection of contrasting achievement areas 
was based upon a distinction between “verbal” 
achievement (English and Social Studies tests) and 
“quantitative” achievement (Mathematics and Natu- 
ral Science tests). For example, scores on the SVIB 
Mathematician key may be logically assumed to be 
related to scores on the ACT Mathematics test, if 
indeed, interests and achievements are related. Fur- 
thermore, the student with an elevated score on the 
Mathematician key may be expected to score higher 
on the Mathematics test than on either the English 
or Social Studies tests, both contrasting areas of 
achievement. 


The median correlation coefficient between SVIB 
interests and absolute achievement scores was com- 
pared with the median correlation coefficient between 
SVIB interests and differential achievement scores 
for each of the following six sets of data: (a) Segel’: 
original data (although Segel discussed the differ- 
ences between the two types of correlations, he neve! 
made this statistical comparison); (b) the correla- 
tions between the SVIB and ACT scales comparable 
to the scales analyzed by Segel; (c) the correlation: 
between all the SVIB and ACT scales analyzed it 
the study; (d) the partial correlations between th 
SVIB and ACT scales comparable to those analyzec 
by Segel with MSAT scores controlled; (e) the par. 
tial correlations between all the SVIB and AC1 
scales analyzed in the study with MSAT scores con. 
trolled; and (f) the partial correlations between th 
SVIB and ACT scales hypothesized to be significanth 
related with each other. The median test (two 
tailed) was used to determine the significance of th 
difference between medians (Siegel, 1956). The sign 
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TABLE 3 
CORRELATION COEFFICIENTS BETWEEN SVIB ScaLE ScorEs AND ACT DIFFERENCE ScorES 
(N = 1,875) 
a 


ACT Difference Score 
———— ee ee eee 





English- Social 
Variable English- Social English- Math-Social Math-Nat. Studies- 
Math Studies Nat. Sci. Studies Sci. Nat. Sci. 


= eS ee 


Minnesota Scho- 
lastic Aptitude 


Test —.13 —.15 —.20 .00 — .06 —.07 
SVIB Scale 

Psychologist UR 05a ete 0 19 — 14a 2204) 04. —10" 086.2007 —.05 
Physician —.13 —.10 —.05 .00 —.22 —.18¢ .08 .08 —.08 —.07 —.21 —.20 
Dentist —.08 —.07 .08 .09 —.11 —.10 14 14 —.03 —.03 —.22 —.21 
Mathematician —.22 —.20 00 04 =17 =—A3 mal Pail 05 07 —.20 —.19 
Physicist —.24 —.22 02 05 —.20 —.17 24 24 05 06 —.25 —.25 
Engineer —.29 —.28 03 .05 —.20 —.18 30 30 .10 11 —.26 —.26 
Production 

Manager —18 —.19 .02 01 —.08 —.10 18 18 .09 09 —12 —.13 
Printer .00 —.03 09 .09 —.02 —.02 07 Oje 0202) — 12 19 
Math-Physical 

Science 

Teacher — 20— 19 902, ,.— 01, —.14. = 13 ad aia 07 08 —.14  —.13 
Personnel 

Director .04. OS lone 4 Sm ome fe 1 7 ON = 8 el pie, 
Social Science 

Teacher ely 17 —.11 —.11 07 Ole 9 — 258 — 2255 10 10 .20 .20 
Social Worker 13 16 —13 -—.11 —.01 03 —.24 —.24 —14 —.13 14 16 
Minister 10 AZ. —.10 —.08 —.01 p02 eae ei See See eee) 09 10 
Musician 07 10 05 .08 —.03 Ob” SS 0 S10 S(O =e San) 
CPA —.07 —.05 —.09 —.07 02 05 —.01 —.01 .09 10 Pie, mts 
Accountant —12 —13 -—.04 —.05 04 03 .08 .08 o15 ats .08 08 
Purchasing 

Agent —.03 —.06 .08 05 12 .09 .09 .09 14 a5 .06 05 
Banker .06 .03 08 05 20 16 01 01 B13 A 15 14 
Life Insurance 

Salesman We 22 —.03 —.05 16 14 —.24 —.24 —.08 —.08 WD, nae 
Lawyer 13 15 —.11 —.09 03 06° =—22, =—22))=1001—.09 16 15 
Author- 

Journalist etS 18 .00 .03 02 06 —14 —14 —13 —.12 .02 .03 
President of bea 

Manufactur- 

ing Concern —.03 —.04 04. 03 02 01 .06 .06 .04 04 —.03 —.03 
Interest 

Maturity 01 O3ea— Ook 00) —03. — 15. — 15, —.07 . — 06 10 Sn 
Occupational 

Level —.01 02" "= 13 — 10) ==206 -e 02 =10- Sige 058 E04 .08 10 
Masculinity- 

Femininity —.23 —.25 01 00 —.13 —.15 22 2 10 10 —16 —.17 


<a nnn cee re ee 


Note.—An ¢ > .05 is significant at the .05 level of probability. Anr > .06 is significant at the .01 level. 

* ri2 signifies the zero order correlation coefficient between the SVIB scale and the ACT difference scores. ; 

byi2.a signifies the first order partial correlation coefficient between the SVIB scales and the ACT difference scores with the 
MSAT scores held constant. . ’ Sn. 

° Bold-faced correlation coefficients represent SVIB scales and ACT difference scores which are hypothesized to be significantly 


telated to each other. 
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TABLE 4 


COMPARISON OF MEDIAN CORRELATION COEFFICIENTS BETWEEN SVIB INTERESTS AND VARIOUS 
MEASURES OF ABSOLUTE VERSUS DIFFERENTIAL ACADEMIC ACHIEVEMENT 








Type of comparison 


ee ee 

Segel’s original data (V = 100) 

Scales comparable to those analyzed by Segel 
(N = 1,875) 

All scales analyzed in study (N = 1,875) 

Scales comparable to those analyzed by Segel with 
MSAT scores held constant (NV = 1,875) 

All scales analyzed in study with MSAT scores held 
constant (V = 1,875) 

All hypothesized relationships (V = 1,875) 





Absolute Differential 
academic academic 
achievement achievement x? 


N of 7’s Medianr N of r’s Median r 


a 


20 Li 20 30 2.50 
20 mies 20 li} .10 
100 mls 150 .10 1.12 
20 .06 20 .16 2.50 
100 .05 150 .10 dais 
19 10 38 ald, 1.48 





* p < .001. 


of the coefficients were disregarded in obtaining the 
medians. 


RESULTS 


The correlation coefficients between the 
SVIB interest scores and the IHSCE subtests 
and subtest difference scores as found by 
Segel are shown in Table 1. Inspection of 
this table indicated that SVIB interests were 
more highly related with achievement differ- 
ence scores than with the individual achieve- 
ment scores. The median 7 of the former was 
.30 compared to a median 7 of .17 for the 
latter. When the difference between these 
median 7’s was tested by the median test, 
however, the difference was insignificant (x° 
=a SOs pie Oo): 

This is a most surprising finding in view of 
the frequent citation of Segel’s study in the 
literature. Segel’s conclusion that interests 
are more highly correlated with differential 
achievement than with “direct achievement” 
should, at the most, be treated as a tentative 
hypothesis. 

The intercorrelations of the SVIB interest 
scores and the ACT achievement scores are 
shown in Table 2.2 Both the zero order and 
first order partial correlation coefficients are 


2 Tables showing the means and standard devia- 
tions of all variables, the intercorrelations of the 
SVIB scores, and the intercorrelations of the ACT 
scores are given in the author’s PhD dissertation 
(Johnson, 1961). 


reported. These correlations may be com- 
pared with the correlations between SVIB in- 
terests and the ACT achievement difference 
scores shown in Table 3. 

The comparisons of the median 7’s between 
SVIB interests and ACT absolute versus ACT 
differential scores for both the zero order and 
partial order correlations are given in Table 4. 

When scholastic aptitude was not held con- 
stant, there was no difference in the magnitude 
of the 7’s found using the two methods. The 
median correlations were all uniformly low, 
ranging from .10 to .17, indicating a minimal 
relationship between measured interests and 
both indices of academic achievement. 

The median correlation between the SVIB 
and the ACT difference scores was signifi- 
cantly greater than the median correlation be- 
tween the SVIB and the ACT test scores 
when scholastic aptitude was held constant 
(x2 = 11.11; p< .001). This differencemwas 
not significant when the comparison of me- 
dians was restricted to the five SVIB scales 
studied by Segel, although the difference noted 
was in the same direction (y? = 2.50; .20 > 
p> 10); 

When scholastic aptitude was controlled, 
the median 7 between interests and absolute 
achievement dropped from .11 to .05. The 
lowered relationship reflects the degree to 
which the correlation between the two varia- 
bles is influenced by their mutual covariation 
with scholastic aptitude. 
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The relationship between the SVIB scales 
and the ACT difference scores, on the other 
hand, was little affected by controlling for 
scholastic aptitude test scores. The median 
correlation between the two variables of .10 
was not changed with the use of the partial 
correlation formula. The use of the difference 
score itself apparently provided a fairly ef- 
fective control upon the student’s ability to 
do well on scholastic tasks. 

Although the hypothesized relationships be- 
tween interests and differential achievement 
(median 7 = .17) tended to run higher than 
the hypothesized relationships between inter- 
ests and absolute achievement (median r = 
.10), the difference between the medians was 
not significant (x? = 1.48; p> .05). 


DiIscussION 


For a group of students of equal ability 
their SVIB interests, as a general rule, will be 
more helpful in explaining the variance in 
relative achievement than in explaining the 
variance in absolute achievement. This result 
supports Segel’s original hypothesis. 

When only hypothesized relationships are 
considered, however, interests are not more 
highly correlated with achievement difference 
scores than with the achievement scores taken 
by themselves. By restricting the comparison 
to only logical relationships, many of the in- 
significant relationships between the SVIB 
scales and the ACT tests were eliminated. 
While some of the insignificant relationships 
between the interest scales and the ACT dif- 
ference scores were also eliminated, the effect 
was less noticeable. The interest scale can 
often be meaningfully compared with only one 
of the four ACT tests. In each of these situa- 
tions, however, it can be logically related with 
two, and perhaps three, of the six ACT dif- 
ference scores. 

Inspection of Tables 2 and 3 revealed that 
119 of the 150 (79%) partial correlations 
between interests and differential achievement 
were significant at the .05 level while only 
51 of the 100 (51%) partial correlations 
between interests and absolute achievement 
were significant at this same level. The inter- 
est scales were related to the achievement 
differences scores in significantly greater pro- 
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portion than to the individual ACT tests 
(CR = 4.64, p < .01). 

Although a great number of the correlations 
between interests and the different measures 
of achievement were significant, principally 
because of the large number of Ss involved, 
very few of these indicated even a moderate 
degree of relationship. A correlation of .20 
was used as an arbitrary cutoff point in evalu- 
ating the importance of the relationship. 
Roughly 10% of the partial correlations be- 
tween interests and the ACT subtest scores 
and 13% of the partial correlations between 
interests and the ACT difference scores were 
potentially promising for counseling indi- 
vidual clients using this criterion. 

The SVIB scales which showed the greatest 
degree of correlation with achievement when 
scholastic aptitude was held constant were 
the various scientific occupational scales 
(Mathematician, Physicist, Engineer, Phy- 
sician, Production Manager, Math-Physical 
Science Teacher), Life Insurance Salesman, 
and Masculinity-Femininity. These same 
scales plus two of the social welfare occupa- 
tional scales (Social Worker, Social Science 
High School Teacher) and the Lawyer scale 
were also the scales most highly correlated 
with the differential achievement scores. 

The interest scales were most closely re- 
lated with the ACT Math and Natural Science 
tests and the Social Studies-Natural Science, 
Math-Social Studies, and English-Math dif- 
ference scores. The SVIB interests appear to 
be the most sensitive to variations in the 
above achievement areas. 

It should be emphasized, however, that the 
relationship between interests and absolute or 
differential achievement, even for the scales 
which showed the highest degree of correla- 
tion, was, at best, a moderate one. The cor- 
relations were not nearly as large as those 
obtained by Segel. Separate analysis by means 
of ¢ tests (based upon the 7 to z transforma- 
tion) indicated that 10 of the 20 correlations 
between interests and differential achievement 
determined in both studies were significantly 
greater (p < .05) in Segel’s study. None of 
the correlations was significantly greater in 
the present study. 

Segel’s subjects may possibly have been 
more heterogeneous in terms of interests 
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and/or achievements. As he did not give the 
means and standard deviations for his sub- 
jects, however, it is impossible to determine 
the extent of individual differences among the 
group members. 

Segel’s study also employed a much smaller 
sample (V = 100) than the size of the sample 
used in the present study. The correlations 
he obtained involved a considerably larger 
standard error term than the present correla- 
tions. The appraisal instruments themselves 
may also have changed sufficiently to lower 
the magnitude of the relationship. The 
IHSCE, for example, emphasized achievement 
in a specific content area, whereas the ACT 
measures the student’s educational develop- 
ment upon a somewhat broader scale. 

The relatively high degree of correlation 
(median r= .59) among the ACT scores 
makes differential analysis difficult (American 
College Testing Program, 1960, p. 17). This 
is a slightly higher degree of intercorrelation 
than that found with the IHSCE subtests 
(median r= .51), however, the overlap 
among the different functions measured is 
rather extensive in both cases. 

Other measures of differential academic 
achievement, for example, grades in course 
work or ratings of achievement in more 
diverse academic areas, should be obtained 
in order to more thoroughly investigate the 
nature of the relationship between interests 
and differential achievement. 

Because of the cross-sectional nature of the 
study it is not possible to predict achievement 
scores from the interest scores. Future re- 
search in this area should allow for the col- 
lection of the interest scores sometime pre- 
vious to the achievement testing. Based upon 
the results of such research, the counselor 
could more effectively use the interest scores 
in helping the student plan an appropriate 
academic program. 

It may also be of some theoretical interest 
to reverse the above research design. If the 
relationship between interests and differential 
achievement -were based upon achievement 
scores obtained sometime previous to the 
interest measurement, it would be possible to 
better evaluate the extent to which interest 
development is predictable from one’s achieve- 
ment scores in different areas. 
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Darley and Hagenah (1955), in their ex- 
tensive review of the literature, suggest that 
interest development is probably more closely 
related to personality growth than it is to 
educational or occupational achievement. The 
relationship between differential achievement 
and interest development over an extended 
period of time, however, has not been investi- 
gated. 

Future research should help clarify the 
nature of the underlying relationship between 
SVIB interests and differential academic 
achievement. Berdie’s (1960) conclusion that 
SVIB scores “‘account for little of the variance 
in academic success’ still holds true. Con- 
tingent upon further research, the test user 
should be wary of interpreting SVIB interests 
as reflecting either absolute or relative past 
achievement. 


CONCLUSION 


Segel hypothesized that SVIB scores were 
more closely related with differential aca- 
demic achievement than with absolute aca- 
demic achievement. This hypothesis was not 
supported, however, when his data were re- 
analyzed. 

Scores upon a scholastic aptitude test were 
used to control for variation in intellectual 
ability known to correlate with both the SVIB 
and the ACT scores. When scholastic apti- 
tude was held constant by means of a par- 
tial correlation technique, the SVIB scores 
were found to be more highly correlated with 
the ACT difference scores than with the sepa- 
rate ACT scores. This finding supports Segel’s 
original hypothesis. 

When only hypothesized relationships were 
considered, the SVIB scales did not corre- 
late more highly with differential achievement 
than with absolute achievement. There were, 
however, relatively more situations in which 
the difference scores could be meaningfully 
related to an SVIB scale. 

Few of the correlations revealed even a 
moderate degree of relationship. The SVIB 
scales and the ACT difference scores which 
did show the greatest degree of intercorrela- 
tion, that is, y > .20, were discussed in terms 
of their potential promise for use in individual 
counseling. 


SVIB INTERESTS AND ACADEMIC ACHIEVEMENT 


Additional research is required to further 
explore the nature of the relationship between 
‘measured interests and differential academic 
‘achievement. The use of a more heterogeneous 
‘measure of academic achievement was ad- 
vised. A longitudinal research design was 
‘recommended. 

Until further research is conducted, the 
‘test user should show considerable caution in 
interpreting SVIB scores as indicators of 
differential academic achievement. The en- 
thusiastic reception given Segel’s findings 
should be moderated. 
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EFFECTS OF THREAT IN A PERFORMANCE 
APPRAISAL INTERVIEW 


EMANUEL KAY, HERBERT H. MEYER 
General Electric Company, Ossining, New York 
AnD JOHN R. P. FRENCH, JR. 
University of Michigan 


Real-life appraisal interviews conducted by 92 manager-subordinate pairs were 
studied intensively. Reactions of subordinates were systematically obtained 
before and after their appraisal interviews and the proceedings in the actual 
interviews were carefully documented by trained observers. Measures of sub- 
sequent performance improvement realized as a result of the appraisal inter- 
views were taken 12 wks. later. The results indicated that a manager’s attempts 
to assist a subordinate by pointing up improvement needs were likely to be 
perceived by the subordinate as threatening to his self-esteem and to result in 
defensive behavior. The greater the threat, the less favorable the attitude to- 
ward the appraisal system and the less the subsequent constructive improve- 
ment in job performance realized. These reactions were strong to the extent 
that the subordinate had relatively low occupational self-esteem. Some prac- 
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tical implications for appraisal practices are cited. 


The formal, periodic appraisal of a sub- 


ordinate by his supervisor or manager and 


the discussion of this appraisal with the sub- 
ordinate is a prevalent personnel activity in 
industry. In recent years, criticisms of ap- 
praisals as an effective means of feedback 
have appeared (Likert, 1959; Maier, 1958; 
“McGregor, 1957). The criticisms basically 
center about the threatening and negative role 
‘which the manager is asked to play in the 
feedback session in respect to the employee. 
Despite the criticisms that have been leveled 
‘at appraisal systems, it must be noted that 
‘the supporting evidence for the criticisms is 
‘inadequate as it consists largely of anecdotal 
‘reports from managers and studies where 
role-playing interviews were used. 

_ The purpose of this study, therefore, was 
‘to provide more relevant and objective evi- 
dence regarding the effects of performance ap- 
praisal discussions between a manager and 
his subordinates. We were interested in de- 
“termining not only the degree to which im- 
“provements in performance typically result 
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from such discussions, but also the effects of 
appraisal interviews on the relationship be- 
tween the manager and subordinate. 


THEORY AND HYPOTHESES 


While research dealing with the perform- 
ance appraisal interview per se is scarce, the 
research on the effects of threat to self-esteem 
and the defenses an individual uses to cope 
with such threats does seem to be relevant to 
the appraisal interview situation. Based on 
these studies, which have been reviewed by 
Wylie (1961), we would expect that an in- 
dividual would be threatened by criticisms 
from his boss in an appraisal interview; and, 
other research on this topic (French, 1964) 
would lead us to expect that an individual 
would cope with such threats either by em- 
ploying defensive reactions or by changing 
his self-concept—the former reaction would 
seem to be easier than the latter. 

Some of the theory and research relating to 
the concept to self-esteem would also indicate 
that an individual’s reaction to threat in a 


312 


performance appraisal interview might be af- 
fected by the strength of his “occupational 
self-esteem,” as defined by Daniel Miller 
(1962). We would expect the man with high 
occupational self-esteem to have enough self- 
confidence to adjust to threats and thus to 
react more constructively than the man with 
low occupational self-esteem. The latter in- 
dividual is expected to react to threats with 
defensiveness and withdrawal. 

It also seems logical to expect that if 
criticism is seen as a threat to one’s ego, then 
praise should have the opposite effect—that is, 
it should have an ego-inflating influence. 
Therefore, we shall test the following hy- 
potheses in this study: 

Hypothesis I. To the extent that occupa- 
tional self-esteem is low, the greater the 
threat to the subordinate in a performance 
appraisal interview—(qa@) the greater the 
amount of defensiveness that will be exhibited 
by the subordinate, (5) the poorer will be 
his subsequent job performance in terms of 
goal achievement, (c) the less favorable will 
become his relationship with his boss, and 
(d) the less favorable will become his attitude 
toward the appraisal system. 

Hypothesis II. The more the subordinate 
is praised in an appraisal interview, the more 
he will exhibit reactions opposite to those 
listed in Hypothesis I—that is, he should be 
less defensive, perform better on the job sub- 
sequently, and so on. 


METHODS 
Appraisal Program Studied 


This study was carried out in an operating depart- 
ment of the General Electric Company which was 
specially selected because it seemed to have a very 
well-administered appraisal program. Records showed 
that approximately 90% of the professional- and 
administrative-level employees were appraised an- 
nually. In addition, an intensive program had been 
carried out for several years to train managers in 
the use of the appraisal system and techniques for 
conducting appraisal interviews. Appraisals focused 
on job performance results rather than personal 
characteristics of subordinates. Each exempt employee 
had a position guide which listed the responsibilities 
of his position and the results expected for each re- 
sponsibility. The appraisal was based on _ these 
measures of results. Salary decisions under a “pay 
for performance” plan were also based on the ap- 
praisal and discussed with the subordinate in the 
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same interview in which the appraisal of perform- 
ance was presented. 


Study Design 


Regularly scheduled “for keeps” appraisal inter- 
views were observed in this study. Before and after 
his appraisal discussion with his manager, the sub- 
ordinate was interviewed and asked to complete 
questionnaires designed to provide certain control 
measures and some of the dependent measures. The 
observation of the appraisal discussion was made 
with the foreknowledge and consent of both the 
manager and subordinate by a third party trained to 
perform his function as unobtrusively and discreetly 
as possible. 

Subjects (Ss) for the study were obtained from 
volunteers among the persons who normally were 
scheduled to receive their appraisals during the time 
the study was in progress. A total of 92 persons 
volunteered out of a total of about 120 scheduled for 
appraisals during this period. All of the participants 
were in the management or exempt-salaried classifica- 
tion, and none had responsibilities for appraising 
the performance of other exempt-salaried persons. 
As a group, the Ss had an average age of 37 years, 
11 years of company service, 2 years of college, on 
the average, and an average salary of about $9,400. 

A comparison of the volunteers with those who 
refused to participate showed that the two groups 
did not differ significantly in position level, per- 
centage salary increase received, and overall ap- 
praisal ratings. 


Measures 


Independent variables. Threat and praise are the 
independent variables considered in this study. The 
measures for these variables and the data-collection 
procedures were as follows: 

1. Number of Threats—During the appraisal dis- 
cussion, a trained observer tabulated the number of 
statements of a critical nature made by the manager 
about the subordinate which seemed to represent 
threats to self-esteem. 

2. Number of Praises—This measure consisted of 
the observer’s tabulation of the number of compli- 
mentary statements made by the manager about the © 
subordinate during the appraisal discussion. 

Dependent variables. The measures were as fol- 
lows: 

1. Defensiveness—Defensiveness was measured 
during the appraisal discussion by the observer who 
noted the number of defensive reactions given by the 
subordinate. 

2. Goal Achievement—The degree to which im- 
provement in performance had been made on the 
part of the subordinate in relation to items dis- 
cussed in the performance appraisal interview was 
coded as Goal Achievement. This measure was 
obtained in separate interviews with the manager 
and the subordinate 12-14 weeks after the appraisal — 
discussion. In these interviews, they were asked to 
estimate the amount of improvement, change, or 
accomplishment on (a) performance items for which 
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criticisms were raised in the appraisal discussion and 
which were vot subsequently translated into specific 
‘goals, and (b) criticisms which were translated into 
‘specific performance goals during the subsequent 
igoal-planning session. A percentage estimate of 
;amount of improvement, change, or accomplishment 
‘was elicited for each item separately. These estimates 
\then were summarized into two overall percentage 
‘accomplishment scores, one for criticisms which 
‘were not formulated as goals, and one for criticisms 
‘which were formulated as goals. Such scores were 
\developed separately for estimates by the manager 
‘and estimates by the subordinate and were used 
‘separately in subsequent analyses. The correlation 
‘between these two estimates was .73. 

3. Man-Manager Relations—Man-manager rela- 

‘tionships were measured on the following specific 
‘dimensions: (a) Trust and Supportiveness, (b) Mu- 
‘tual Understanding, (c) Acceptance: of Goals, (d) 
\Perception of Being Valued. Questionnaire items to 
‘measure these dimensions were drawn from other 
sstudies carried out by the University of Michigan 
\Institute for Social Research and from attitude sur- 
vvey forms which had been developed previously for 
vuse in the company. Changes in man-manager rela- 
‘tionships were measured by comparing responses of 
ysubordinates to similar items before and after the 
appraisal interview. 
_ 4. Attitudes Toward the Appraisal System—Items 
measuring attitudes toward the appraisal system 
»were drawn from previous company attitude survey 
‘forms and were administered in the before-and-after 
‘interviews conducted by the researchers in order to 
measure differences that might be attributed to the 
vappraisal discussion. 

Occupational self-esteem as a conditioning variable. 
‘This measure was obtained in the preappraisal inter- 
view conducted by the researchers. Occupational self- 
sesteem was defined as the average evaluation a per- 
‘son placed on the attributes of his occupational 
self-identity. Operationally, it was measured by ask- 
‘ing the appraisee to describe attributes of his self 
‘which he considered to be part of his occupational 
‘role. For each attribute, the appraisce rated the 
‘importance of the attribute to him, and the degree 
‘to which he was personally satisfied with this at- 
‘tribute. The mean of the cross-products of the im- 
‘portance and satisfaction ratings was used as the 
occupational self-esteem score. 

Control variables. Since the threat and praise 
‘variables were not experimentally manipulated, sev- 
‘eral control measures which presumably could affect 
‘reactions to threat or praise were obtained before 
“he appraisal discussion. The measures were as fol- 
Ows: 

1, Previous Man-Manager Relationship—The na- 
‘ure of the previous relationship that existed between 
tach appraisee and his manager was expected to 
iffect his reaction to threat or praise in the ap- 
praisal feedback discussion. A measure of this man- 
manager relationship was obtained by a combination 
f questions administered in the preappraisal inter- 
view and covered the same areas as the dependent 
measure of man-manager relationship. 
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2. Mobility Aspirations—A second control measure 
obtained was the position level each S aspired to in 
his next 5 years with the company. 

3. Satisfaction with Progress—A third control 
measure obtained was the satisfaction each S had 
with his job progress to date. 

4. Past Experience with the Appraisal System—A 
fourth control measure was the subordinate’s evalua- 
tion of his past experience with the performance 
appraisal system. 


RESULTS 


A 2X2 analysis of variance design was 
used to test the main effects of the independent 
and conditioning variables on the dependent 
variables. The interaction term was used to 
test conditioning effects. 

Hypothesis Ia, This hypothesis was con- 
cerned with the relationship between threat 
and defensiveness, and the interaction effects 
of occupational self-esteem on this relation- 
ship. The results for the test of this hypothesis 
are shown in Table 1. 

As can be seen from Table 1, the significant 
main effects confirm the hypothesis. As threat 
increased, defensiveness also increased. It can 
be seen that the Ss who had a high number of 
threats in their appraisal discussions reacted 
defensively an average of 11 times per ap- 
praisal discussion. Those who received a be- 
low-average number of threats, on the other 


TABLE 1 


RELATIONSHIP BETWEEN NUMBER OF THREATS AND 
NuMBER OF DEFENSES AND THE Errects or Occu- 
PATIONAL SELF-EsteEM AS A CONDITIONING 
VARIABLE 


Average number of defenses for 
subordinate who received 


Low High 
Occupational number number 
self-esteem* of threats of threats 
N = 41 N = 46> 
High aio 1253 
Low 1.0 9.7 
M veal 11.0 
ie p 
Columns 20,9 <.001 
Rows 1.4 >.10 
Interaction mail >.10 


® High N = 44, and Low N = 43, 

b All data were not always available for all subjects for various 
tests of hypotheses, therefore slight variations in Ns will, be 
noted in the tables presented here. 














314 E. Kay, H. H. Mryer, ano J. R. P. FRENcuH, Jr. 
TABLE 2 
THE RELATIONSHIP BETWEEN NUMBER OF THREATS AND GOAL ACHIEVEMENT AND THE EFFECTS 
OF OCCUPATIONAL SELF-ESTEEM AS A CONDITIONING VARIABLE 
Estimates of goal achievement 
Subordinates Managers 
Low num- High num- Low num- High num- 
ber of ber of ber of ber of 
Occupational threats threats threats threats 
self-esteem N N = 36 Vera 2 N Nie —a50 N = 40 
High 39 68 65 40 59 67 
Low 39 78 53 36 77 58 
M 72 58 68 62 
F p F ? 
Columns 5.05 05>p>.01 <i >.10 
Rows 1 >.10 <il >.10 
Interaction 3.12 10>p>.05 5.02 .05>p>.01 





hand, defended only about 2 times per ap- 
praisal discussion, on the average. A more de- 
tailed analysis of the data revealed that this 
very large difference between the two groups 
was accounted for by the fact that the ratio 
of defenses per threat increased as the num- 
ber of threats increased. If the manager 
pointed up a few areas of needed improve- 
ment early in the discussion, the subordinate 
was not as likely to be defensive as he was to 
similar threatening items as the number in- 
creased throughout the discussion. Accord- 
ingly, when the above- and below-average 
groups on number of threats were compared 
on ratio of defenses per threat, it was found 
that those who received an above-average 
number of threats in their appraisal discus- 
sions offered defenses to about two-thirds of 
the threats, whereas those who received a 
below-average number of threats reacted de- 
fensively to only one-third of the threats. 

In summary, it appears that the results re- 
garding the main effects of threat on defensive 
behavior strongly support the hypothesis. The 
expected interaction of occupational self- 
esteem on the relationship between threat and 
defense, however, was not observed. 

Hypothesis Ib. According to this hypothe- 
sis, greater amounts of threat were expected 
to result in less subsequent goal achievement 
by subordinates, and this relationship was 
expected to be strong to the extent that oc- 


cupational self-esteem was low. The results 
for this analysis are shown in Table 2. 

The results in Table 2 tend to confirm the 
hypothesis. In the case of subordinate esti- 
mates of goal achievement, the main effects 
are significant. The group with low number 
of threats reported significantly higher per- 
formance than the group with high number of 
threats. The interaction of occupational self- 
esteem with the main effect is significant at 
the 10% level of confidence. For the manager 
estimates of goal achievement, the main effects 
are not significant, but the interaction effect 
is significant at the .05 level. 

As a further and somewhat more controlled 
check on the effects of threat on goal achieve- 
ment, an additional “within-person” analysis 
was made by comparing goal-achievement 
scores on those items appraised most nega- 
tively by the manager to goal-achievement 
scores for all other items. In the interviews 
with the Ss immediately following their per- 
formance appraisal discussions, each man was 
asked to identify the performance item on 
which the manager had been most negative in 
the appraisal. Considering only those items 
which got translated into goals, both the man- 
ager and the man tended to perceive a lower 
level of achievement for these high-threat per- 
formance goals than for all other goals (.10 > 
Pi 09)- 

In summary, it would appear that Hypothe- 
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sis Ib dealing with the effects of threat on 
goal achievement has been confirmed. 

Hypothesis Ic. This hypothesis was con- 
‘cerned with the effects of threat on attitudes 
‘toward the manager. Essentially, there were 
no changes in attitudes toward the manager 
as a result of threat. It is significant to note, 
however, that persons who received a high 
number of threats reported significantly poorer 
attitudes toward the manager in their first 
interview with the experimenter, prior to their 
appraisal discussion with the manager. After 
the appraisal discussion, their attitudes were 
slightly more negative, but the drop was not 
significant. The low-threat group showed sig- 
nificantly more favorable attitudes toward the 
manager both before and after the appraisal 
discussion, with no change occurring in the 
process. Thus, it does not appear that the 
appraisal discussion affected attitudes toward 
the manager, although attitudes held prior 
to the interview were quite predictive of the 
amount of threat the subordinate would ex- 
perience. The interaction effect of occupa- 
‘ional self-esteem on the effect of threat on 
attitudes toward the manager was in the 
direction predicted, but not strong enough to 
ve statistically significant. 

Hypothesis Id. This hypothesis considers 
the effects of threat on attitudes toward the 
appraisal system and the interaction effects of 
occupational self-esteem. A check of the con- 
‘rol measure for this hypothesis showed that 
attitudes toward the appraisal system meas- 
ured before the appraisal discussions did not 
liffer significantly for the groups that sub- 
sequently experienced either high or low 
hreat. A comparison of the same attitudes for 
hese groups after the appraisal discussions is 
shown in Table 3. 

As can be seen from this table, the high- 
hreat group expressed significantly less favor- 
ible attitudes toward the appraisal system 
.fter the appraisal interviews than did the 
ow-threat group. The expected interaction 
‘ffect of occupational self-esteem was not ob- 
ierved, 

Hypothesis II. This hypothesis examined 
he effects of praise on the same dependent 
rariables considered for threat in Hypothesis 
. Praise was expected to have the opposite 
fects of threat. Actually, the results show 
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TABLE 3 


RELATIONSHIP BETWEEN NUMBER OF THREATS AND 
ATTITUDE TOWARD THE APPRAISAL SYSTEM AND THE 
EFFECTS OF OCCUPATIONAL SELF-ESTEEM AS A 
CONDITIONING VARIABLE 








Attitude scores for subordinates 
who experienced 
Low High 
number number 
Occupational of threats of threats 
self-esteem N Nie ee 4) 
High 42 315 2.85 
Low 41 3.30 Des 
M Sn 2.80 
jit p 
Column 6.95 .05>p>.01 
Row Al >.10 
Interaction <f >.10 











that praise had no significant effects whatso- 
ever on the variables considered in this study. 
It did not reduce defensiveness; it did not 
increase goal achievement; it had no dis- 
cernible effects on attitudes toward the man- 
ager and the appraisal system. 


DISCUSSION 


The statistically significant findings in this 
study were in almost every case consistent 
with the hypotheses where threat was the 
independent variable. Hypothesized relation- 
ships which did not attain an acceptable level 
of significance were also, in most cases, con- 
sistent with the hypotheses. Of particular prac- 
tical significance is the relationship found be- 
tween number of criticisms or threats in the 
appraisal discussion and subsequent goal 
achievement on the part of the subordinates. 
The findings do not, of course, prove that the 
larger number of threats caused the lower 
level of goal achievement observed. Other 
factors could also account for the differences 
found on the goal-achievement measure be- 
tween Ss who received above- or below-aver- 
age number of threats. An obvious possibility 
is that the variance obtained in the goal- 
achievement measure merely reflects relatively 
permanent differences in levels of ability for 
the participants in this study. We found, for 
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example, that persons who received a high 
number of threats received significantly lower 
ratings of performance from their managers.’ 
Thus, they might not be expected to score as 
high on the goal-achievement variable, ir- 
respective of their experience in the appraisal 
discussion. However, if this variable really 
measures improvement in performance, then 
the poor performers had more room for im- 
provement as well as possibly lower ability, 
and it would be hard to predict whether they 
should gain more or less. In addition, the 
fact that the Ss in the high-threat group re- 
ceived lower overall ratings would not explain 
the obtained interaction effects of occupa- 
tional self-esteem. Table 2 showed that the 
goal achievement of men in the high- and 
low-threat groups, who also had high occupa- 
tional self-esteem, did not differ significantly. 
For men with low occupational self-esteem, on 
the other hand, those in the low-threat group 
had significantly higher goal achievement than 
those under high threat. 

A second finding which would indicate that 
threat in the appraisal discussion may have 
accounted for a lower level of goal achieve- 
ment relates to a ‘“‘within-person” analysis 
which was carried out to test the effects of 
criticism on various specific performance items 
mentioned by the manager in the appraisal 
interview. This analysis showed that goal 
achievement for those performance items on 
which the subordinate had experienced most 
threat in the appraisal discussion (that is, 
those on which there was most room for im- 
provement) tended to be lower than goal 
achievement for other performance items 
G10 Sap S05 

Defense as an intervening variable. It is 
possible that the negative relationship found 
between threat and goal achievement could 
be accounted for, in part at least, by the inter- 
vening variable, number of defenses. The in- 
fluence of defensiveness on goal achievement 
is difficult to determine since a high correla- 
tion was found between number of threats 


1 Number of threats did not, however, correlate 
with amount of salary increase granted, and under 
the pay-for-performance salary plan, the amount of 
salary increase granted an individual was supposed 
to be a more important index of the quality of his 
performance than was his overall rating. 
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and number of defenses (r= .55). A sig- 
nificant difference was found in subordinates’ 
estimates of goal achievement between groups 
receiving above- and below-average numbers 
of threats (see Table 2). Almost the identical 
difference in goal achievement was found for 
above- and below-average groups on number 
of defenses. 

It seems logical to assume that the sub- 
ordinate might perceive less improvement in 
performance in those cases where he had been 
quite defensive in his appraisal discussion, 
since a defense is essentially a denial of re- 
sponsibility for an item of poor performance 
which the manager cites. To see improvement 
in relation to such items would in effect be 
admitting that the manager was correct in 
his assumption that improvement was needed. 

Negligible effects of praise. One of the clear- 
est findings is the fact that the use of praise 
does not seem to accomplish very much for 
the manager. This may have been due to the 
fact that, for the most part, praise used by the 
manager in this context did not seem to be 
perceived by the subordinate as sincere praise. 
The data revealed some tendency for number 
of praises in the appraisal interview to be 
correlated with number of criticisms (7 = .20). 
This fact may reflect an attempt on the part 
of managers to sandwich each criticism be- 
tween a couple of items of praise, as is so 
often recommended in how-to-do-it manuals 
on performance appraisal. If this were the 
case, the subordinates probably began to rec- 
ognize the praise-criticism-praise pattern very 
quickly and therefore did not respond in an 
expected, constructive way to praise. Or, an- 
other way of putting it is to say that praise 
very quickly became a conditioned stimulus 
for criticism. 

Some practical implications of the study 
findings. This study appears to have some 
very practical implications for appraisal prac- 
tices. If a manager’s objective is to stimulate 
a subordinate to improve his job performance, 
apprising him of his shortcomings in a com- 
prehensive review of the past year’s perform- 
ance does not seem to be an effective way to 
accomplish this objective. In fact, this study 
indicated that the more areas of needed im- 
provement the manager called attention to in 
such a performance-review discussion, the 
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poorer the results he achieved in terms of 
subsequent constructive goal achievement on 
the part of the subordinate. 

Perhaps the use of “constructive criticism” 
would be more effective (or less ineffective) if 
it were used more sparingly than most man- 
agers were observed to use it in this study. 
The positively accelerating ratio of defenses 
per threat observed as the number of threats 
increased would indicate that an “overload 
phenomenon” may operate in this type of 
situation. Or to put it another way, each in- 
dividual may have a tolerance level of the 
amount of criticism he can absorb, and as this 
level is approached or surpassed, it becomes 
increasingly difficult to accept responsibility 
for shortcomings cited. 

How can the manager assist a subordinate 
in improving his performance? Certainly 
knowledge of results is important in effecting 
improvement, and this is one of the primary 
purposes of a performance appraisal program. 
Studies of the learning process have indicated, 
however, that this knowledge is proportion- 
ately less effective as the time between per- 
formance and feedback lengthens. This fact 
would argue for more frequent feedback dis- 
cussions between manager and subordinate, 
and especially for holding discussions of 
needed improvement in particular perform- 
ance areas at the time specific shortcomings 
relating to such performance items are ob- 
served. Some managers have reported that 
the formal performance appraisal program 
tends to cause them to “save up” items where 
improvement is needed in order to have 
enough material for a comprehensive discus- 
sion of performance in the annual review. 
More frequent discussions would avoid the 
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concentration of criticisms with the resulting 
“overload phenomenon” which seems to cause 
the subordinate to be more defensive than 
constructive. 

A second phase of this study concentrated 
on the development and testing of more con- 
structive methods for stimulating performance 
improvement (French, Kay, & Meyer, in 
press). Specifically, this dealt with the effects 
of subordinate participation in planning job 
goals. While the implications of the findings in 
this present study reported here are largely 
negative (i.e., comprehensive performance ap- 
praisal discussions have a largely negative ef- 
fect), the implications of the second phase of 
the study are largely positive. The mutual 
planning by manager and subordinate of spe- 
cific work plans and goals was found to result 
in measurable achievements in job perform- 
ance of obvious practical significance. 
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Biographical data items were weighted and cross-validated for the identifica- 
tion of creative research personnel. Significant linear, partial linear, multiple, 
and multiple-partial correlations are presented between no previous experience 
(NPE) and previous experience (PE) keys and a variety of creativity criteria. 
A behavioral and perceptual image of the creative scientist is presented, to- 
gether with a discussion of the communality inherent in various criteria of 


creativity. 


Previous research (Albright & Glennon, 
1961; Smith, Albright, Glennon, & Owens, 
1961) has demonstrated the utility of using 
biographical and personal history items for 
the identification of creative research person- 
nel, The present study reports upon the con- 
current cross-validity inherent in reweighting 
certain of these items, as well as certain other 
items not previously validated for this pur- 
pose, for the identification of creativity among 
biological, medical, chemical, and pharma- 
ceutical research personnel. 


PROCEDURE 


Sample. The participants in this study were 132 
male research personnel in the employ of a major 
pharmaceutical research and manufacturing organi- 
zation.1 By job description they were directors and 
assistant directors of research, research supervisors, 
senior investigators, junior investigators, senior re- 
search assistants, research assistants, and technical 
support personnel for the functions listed. Educa- 
tionally, 4 held both the MD and PhD degrees, 
6 held the MD degree, 64 held the PhD degree, 16 
held the MS degree, 3 held the DVM degree, 2 held 
the LLB degree, 33 held the Bachelor’s degree, and 
4 had college training but no degree. By area of 
technical specialization (apart from the MD) they 
represented chemistry, pharmacology, biochemistry, 
biology, endocrinology, bacteriology, physiology, 
veterinary medicine, zoology, various substrata of 
these areas, and mathematics, law, and engineering. 
Group age ranged from 23 to 63 with a mean of 
38.8. Group experience with the organization ranged 
from 1 to 47 years, with a mean of 9.9, 

At the time of the group administration of the 
questionnaire, it was explained that the results of 


1 The author would like to thank the Management 
of G. D. Searle & Co. for permission to report this 
work, Thanks are also extended to B. H. Pickrell 
for his invaluable assistance in the conduct of the 
statistical analyses. 


the study would in no way affect the job status 
of the subjects (Ss) and that individual results 
would remain anonymous. As further assurance, the 
Ss were given the option of not signing their names 
to the questionnaire if, after completing it, they felt 
so inclined. Only two Ss chose this option and they 
were dropped from the sample as unidentifiable for 
statistical purposes. In essence then, the entire 
(minus 2 men and 5 females) professional staff at 
work on the day of the administration served as a 
sample, disallowing the spurious effects attributable 
to “volunteer” samples. From this total sample, 80 
persons were randomly selected for use as an item- 
weighting sample. The remaining 52 persons con- 
stituted a holdout group for concurrent cross- 
validation. 

Criteria. Two criteria of creativity were employed. 
Both were in the form of personnel evaluations. The 
first, or rank criterion, was stated to the evaluators 
as follows: 


Hypothetically— 

You have been selected to be Director of Re- 
search for another pharmaceutical company. G. 
D. Searle & Co. has agreed to let you take 
with you, to your new organization, certain 
persons now at the G. D. Searle & Co. Re- 
search Center. You have been instructed by 
your new management to bring with you those 
persons who will make the most significant, 
original, and lasting contributions to research. 
Try to focalize your evaluation on the three 
underlined characteristics above, disregarding the 
individual’s field of specialization. 


After reading this statement and a set of instructions 
setting forth ways of maintaining objectivity and 
relevance, the raters divided their ratees into five 
groups of approximately equal size and ranked 
them within these groups; grouping was employed 
so as to minimize the: difficulty inherent in the 
continuous ranking of large numbers of people. | 
Accordingly, Group 1 was made up of the most 
creative 20%, Group 2 the next most creative 20%, 
and so forth. These within-group rankings were then 
converted to single-rank criteria for all Ss ranked 
by a particular rater, and these continuous rankings 
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converted to linear scale scores by the methods 
suggested by Hull (1922) and Garrett (1947), 
thereby permitting the combination of distributions 
of varying size and the averaging of individual 
criterion scores from those distributions. 

Since Rater 1 evaluated all 132 people, his rank 
criterion was taken as one rank variable and a 
second rank variable was made up of ranks con- 
tributed by the other 4 raters. These two linearized 
criterion variables were correlated (r = .66, N = 132), 
Suggesting moderate criterion reliability. However, 
since the large portion (V = 108) of the second rank 
continuum was contributed by Rater 2, interrater 
correlations were calculated to determine individual 
degrees of rater agreement. With respect to Rater 1 
(N =132), Raters 2, 3, 4, and 5 correlate .70 
(N=108), .79 (V=80), 53 (N=24), and 41 
(N =7), respectively, suggesting that good rater reli- 
ability existed insofar as the large majority of 
ratees was concerned. This being the case, average 
linearized rank scores were computed for each ratee 
across raters. Accordingly, 86 individual criterion 
scores were based on 3 linearized rankings, 41 were 
based on 2, and 5 were single linearized rank scores 
taken from the one complete (VN = 132) continuum. 
While linear scores were not necessary for the 
computation of the individual item tetrachoric co- 
efficients, the scoring key validities and rater reli- 
abilities demand linear data. 

The second criterion was the “Supervisor’s Evalua- 
tion of Research Personnel” (SERP; Buel, 1962), 
a forced-choice supervisory evaluation of creativity. 
While no biographical data item analysis against 
this criterion is reported here, SERP was included 
in an attempt to demonstrate its utility as a part 
of the research personnel evaluation program, obvi- 
ating the necessity for man to man or relative 
personnel evaluations, time consuming and group- 
bound as they are. 

Method. For purposes of differentiating varying 
degrees of research creativity, a 118 item biographi- 
cal personal history form was administered to the 
sample. The questionnaire contained 59 items drawn 
from previously reported work (Albright & Glennon, 
1961; Smith et al., 1961) in this area. These items 
had variously been validated against a creativity 
and/or research performance criterion, a patent cri- 
terion, and a physical scientists’ career aspiration 
criterion. The remaining items had also been vali- 
dated, but not for the identification of creativity. 
However, they were included because they provided 
measurement consistent with current theory on 
creative behavior. 

The items covered personal history and experience 
relevant to early life, schooling, previous work and 
professional experience, environmental preferences, 
personal activities, aspirations, etc. Some were true 
personal history items while others centered in 
expressions of attitude toward or perception of 
personal experiences. By definition then some were 
objective, others subjective, or evaluative. For 
item analysis purposes (V = 80) against the line- 
arized rank criterion, the .05 level of significance 
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TABLE 1 


CONCURRENT VALIDITY (r’s AND R) oF NPE Ano PE 
Krys ror Two Criteria, VN = 52 











Criteria 
Linearized 
rank SERP 
NPE A6** 14 
PE 55 .29* 
R 5 Tak .30* 





aThe equation for R is Xe = b1X1 +b2X24+K or, Xe 
= .484X1 + 2.171X2 + 6.003. 

*p < .05. 

*k > = 1001. 


was adopted. Item-criterion correlations were in 
the form of tetrachoric estimates of the linear 
correlation coefficient (Davidoff & Goheen, 1953), 
tested for significance as suggested by Guilford and 
Lyons (1942). Such significance testing takes account 
of the spurious effects of nonmedian splits. While 
the weighting sample was divided at the median 
(40/40), some category response frequencies did 
deviate on the order of p=.85, p’ = .50. However, 
the vast majority of category response frequencies 
approximated ~=.50 to .65, p’=.50. 

Fifty items were found to be valid at the chosen 
significance level (many were significant at the .01 
and .001 levels) with tetrachoric coefficients ranging 
from +4.34 to 54.87. Forty-one of these items were 
amalgamated into a ‘“no previous experience” 
(NPE) scoring key; the remaining 9 items consti- 
tuted a “previous experience” (PE) scoring key, 
experience being defined as at least one job previous 
to their present affiliation. After England (1961), 
O, 1, and 2 weights were assigned to each item to 
represent negative, zero, and positive intraitem 
differentiations, respectively. 


RESULTS 


For purposes of demonstrating concurrent 
validity, the 52 holdout questionnaires were 
scored with the two keys. The linear and mul- 
tiple correlations of these two keys with the 
two criteria are presented in Table 1. Since 
age appeared to covary with the NPE and PE 
scores and with the criteria, partial and 
multiple-partial correlations (Peters & Van 
Voorhis, 1940) were calculated, removing the 
effects of age.” Table 2 presents those correla- 
tions, while Table 3 presents the correlations 
necessary for the solution of the partial and 


2yr (partial) between the linear and SERP criteria 
is 42, while r (partial) between NPE and PE keys 
see 5 
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TABLE 2 


CONCURRENT (PARTIAL) VALIDITIES (7’s AND R) 
or NPE Anp PE Keys For Two CRITERIA 
(CoNTROLLED ON Ack, NV = 52) 





Criteria 
Linearized 
rank SERP 
NPE 38k .10 

PE AG 27% 

R Ai .27* 
* >< .05. 
*k ® ZX .O01. 

mK S001. 
multiple-partial correlations presented in 


Tables 1 and 2. 


DISCUSSION 


With regard to the validity of the two keys 
developed, the coefficients suggest that sig- 
nificant concurrent validity exists in re the 
linear criterion. Whether it is necessary to 
control on age—whether, in this organization 
creativity ratings are a function of ratee age 
or a reflection of rater bias in favor of older 
personnel—is a question to be answered by 
research management. If such is the case, 
sufficient concurrent validity remains, after 
correction for age, to warrant the utilization 
of the keys developed. However, because of 
very low attrition, age may not be assumed 
to be a very significant causative factor; that 
is, creativity is not related to age as a func- 
tion of the fact that relatively noncreative 
younger personnel have left the organization 
and thereby left behind a residual of highly 
creative older personnel. Rather, it appears 


TABLE 3 


INTERCORRELATIONS BETWEEN NPE, PE, AND AGE, 
AND LINEARIZED RANK AND SERP, WN = 52 








Criteria 





NPE PE Age Rank SERP 





NPE 1.00 LO2Me aL 
PE 1.00 .62 
Age 1.00 34 2 
Linearized rank 1.00 43 
SERP 1.00 
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that research management hired, in a rela- 
tively short span of time, and older men who 
stayed with the organization and who are 
more creative. Such an interpretation is con- 
sistent with the fact that controlling on age 
partials out little relevant variance. 

The question may arise as to why these 
two keys (41 and 9 items) are not treated 
as a Single key of 50 items. For simple reasons 
of applicability to inexperienced and experi- 
enced persons, separation is necessary. More- 
over, totaling produces an 7 of .52, while 
combining gives rise to an R of .57. Since the 
NPE and PE keys were developed on both 
experienced and inexperienced personnel, the 
question did arise as to the defensibility of 
deriving statistics on a group, part of which 
were not meant to be treated with those 
statistics (weighting the PE key on Ss who 
had no previous experience perhaps creates 
an artificial difference). Since removal of in- 
experienced personnel from the weighting 
sample would have reduced N for purposes 
of NPE and PE key development, this was 
not done. However, in the cross-validation 
group it was possible to divide the sample 
in terms of experience. Accordingly, the 42 
Ss with previous experience have a mean NPE 
score of 41.3; the 10 Ss with no previous 
experience have a mean NPE score of 40.0. 
Further, the 42 Ss with previous experience 
have a mean PE score of 9.9; 10 Ss with 
no previous experience have a mean PE score 
of 7.7. While neither difference is statistically 
significant, these findings would suggest that 
the inclusion of experienced and inexperi- 
enced personnel in both the weighting and 
cross-validation samples tends to suppress 
mean score differences on both keys. Had 
strictly defined groups been used in both 
situations, the differences should have been 
larger. However, for purposes of selecting 
personnel with no previous experience, only 
the NPE key is to be used; for purposes 
of selecting personnel with previous experi- 
ence, both keys are recommended. Experi- 
enced personnel can legitimately answer both 
NPE and PE types of questions, 

Since it is quite possible that the original 
criterion rankings were based, in part, on the 
rater’s knowledge of the Ss’ patent and publi- 
cation behavior, the correlation between the 
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linear criterion and the PE key (with the 
one patent and the one publication item 
removed to eliminate spuriously high correla- 
tion) was calculated. That correlation is .45 
(.001), suggesting that the seven remaining 
PE items possess considerable validity for 
this purpose. No such calculations were done 
with regard to the NPE key because, by 
definition, no previous job experience items 
were included. 

As suggested previously, SERP was in- 
cluded in this study in an effort to evaluate 
its usefulness as a research personnel evalua- 
tion device and criterion. The correlations 
suggest significant relationships between it 
and both PE and NPE and PE combined. 
That the validities of NPE and PE against 
SERP, as a criterion, are not as high as is 
the case with the linearized rank criterion 
is not surprising—NPE and PE items were 
analyzed against the linearized rank criterion 
and therefore, should most accurately predict 
it. Further study could item analyze NPE 
and PE against SERP as a criterion. It would 
then be hypothesized that these two correla- 
tions would increase substantially and that 
the corresponding correlations for the line- 
arized rank criterion would shrink. 

In some previous studies (Buel & Bachner, 
1961; Smith et al., 1961) patents and publi- 
cations have been used as criteria. While 
such was not the intention in the present 
study, it is interesting to note the correlation 
between number of patents and the linearized 
rank criterion (.65) and between number of 
publications and the same criterion (.68). 
Correlations of this magnitude suggest that 
the number of patents and publications bears 
a relationship to the ratings received by 
individual researchers. However, it is also 
interesting to note that significant validity 
can be achieved by using the NPE and PE 
keys (V = 52) to predict the patent and 
publication behavior of research personnel. 
The NPE key correlates .50 (.001) and .49 
(.001) with patents and publications, re- 
spectively, while the PE key (with the one 
patent and the one publication item removed 
to eliminate spuriously high correlation) cor- 
relates .34 (.05) and .51 (.001) with these 
two criteria, respectively. It is apparent then 
that the two keys have validity for predicting 
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several meaningful criteria of creativity and 
research performance. 

Reviewing the valid items, an image of the 
creative biological and physiological scientist 
emerges which is corroborative of previously 
reported images (Buel, 1960, 1962; Smith 
et al., 1961). The more creative men tend to 
have a positive self-image, a need for personal 
independence in work and social environs, 
wide interests, a history of parental permis- 
siveness insofar as decision making is con- 
cerned, and a tendency to become over- 
involved (in terms of time available to per- 
form job-related activities). Further, they 
tend to react positively to challenge, seek 
unstructured work situations, and desire 
contemplative pursuits. Apparently, creative 
personnel in a variety of research areas 
(petroleum, organic, biological, and physio- 
logical) are describable and identifiable in 
similar terms. 
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COMPARATIVE ACCURACY OF RECOGNIZING 
AMERICAN AND INTERNATIONAL ROAD SIGNS 


RONALD E. WALKER, ROBERT C. NICOLAY, anp CHARLES R. STEARNS 


Loyola University, Chicago 


This study investigated the hypothesis that symbol road signs (similar to the 
international signs) could be more accurately recognized than word road signs 
(typical of the American signs). The Ss used were 81 college undergraduates. 
The hypothesis was significantly supported under 2 conditions. Under 1 condi- 
tion, both the symbols and signs were black; in the other, the symbols were 
black and red. A further phase of the study demonstrated the ease with which 
the symbol signs were learned. A simple memory test conducted 24 hrs. after the 
learning indicated perfect recall of the symbol signs and their meaning. The 
potential significance of the results and research possibilities were discussed. 


There are two primary systems of road 
signs used in the world today, the American 
and the international.t The basic difference 
between these two systems is that the former 
uses primarily words and the latter uses 
symbols. Since the time when the main func- 
tion of road signs was to indicate directions 
(Eliot, 1960), both of the systems have be- 
come quite complex and very critical for the 
safety of pedestrians and motorists. The pur- 
pose of this study is to test the two methods 
empirically in an attempt to ascertain which 
is better in terms of ease and accuracy of 
recognition. 

In America the official standards of road 
signs are set forth in the Manual on Uniform 
Traffic Control which was first published by 
the Bureau of Public Roads in 1927. This 
manual offers traffic-control suggestions to 
each state. There are no specific laws that 
the federal government imposes on the states 
or local governments; however, nearly all of 
the states have accepted this manual and 
follow it with only minor exceptions. For 
example, while the official standard for the 
country is red and white stop signs, both red- 
white and yellow-black signs can be seen in 
the various states. The American standards 
have evolved as various tests have shown the 
advantage of one particular sign or concept 
in contrast to others. These tests have been 
in the following areas: visibility, color com- 


1 The international signs are often called European 
signs, 
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parison, shape, letter style and _ height, 
recognition, and ease of interpretation.” 

The international road signs were originally 
adopted by the United Nations at Geneva, 
Switzerland, in 1949 and revised in 1953. 
While the American signs are more varied 
in shape, the international ones are simpler 
in design and eliminate language barriers. For 
example, instead of using “PEDESTRIAN CROSS- 
ING,” as is used in most American states, the 
international sign with the same meaning is 
a red triangle enclosing a white area with a 
figure of a man in black on the white back- 
ground, In addition, the international signs 
are seemingly quite easy to interpret. In a 
recent study, Brainard, Campbell, and Elkin 
(1961) found that the ability of college 
students to learn the meaning of many of 
these signs approached 100% after a very: 
brief training period. 

The hypothesis in this. present experiment 
was that symbol signs (approximations of the 
international signs) can be recognized sig- 
nificantly more easily and accurately than 
word signs (approximations of the American 
signs). This hypothesis was advanced even 
though familiarity of American subjects (Ss) 
with their own street signs should be a posi- 
tive factor for the recognition of the Amer- 
ican word signs. In addition, a further con- 
cession was made to the American word signs 
in the first and main stage of the study, 
namely, all of the stimuli (both word signs 


2The authors thank Jack A. Hutter of North- 
western University’s Traffic Institute for his very 
helpful information about American road signs. 
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and symbols) were printed in black. This pro- 
cedure was employed to eliminate differential 
color cues which in the present systems would 
be expected to favor the recognition of the 
chromatic international signs. 

Relevant to the question of color cues, a 
secondary investigation involving a much 
smaller number of Ss was carried out. In this 
second stage, the hypothesis and all conditions 
remained the same with the exception that 
the symbol signs were red and black instead 
of black only. 


METHOD 
Subjects 


The Ss used in the first stage were 70 under- 
graduate students, 26 males and 44 females, enrolled 
in introductory psychology sections at Loyola Uni- 
versity. For the second stage 11 male introductory 
psychology students served as Ss. No Ss who were 
familiar with the international symbol system or who 
were color blind were used in either stage of the 
experiment, 


Stimulus Material 


The word signs were: NO RIGHT TURN, NO LEFT 
TURN, and pO Not ENTER. Their symbol counterparts 


were: Fn oa , and ——. The circles that typi- 


cally enclosed the international symbols were elimi- 
nated as they were obvious cues for an S$ attempting 
to discriminate between word signs and the symbols. 
The words and symbols were centered in a 8 X 8 
inch area on 10 X 10 inch Crescent illustration board 
(cold press surface ##100). The surface of the latter 
is relatively smooth and flat in contrast to gloss 
finish. The signs were printed in 2-inch Roman 
letters; all signs and symbols were 1 inch wide. A 
flat India black (#2422 Post Drawing ink) was used 
for all stimuli in testing the primary hypothesis. 
For the chromatic condition, only the diagonals and 
the dash (the symbol for po NoT ENTER) in the 
international system were red. The ink used for 
this condition was translucent scarlet (from Higgins 
American Ink Company). One completely blank 
card was employed to control for guessing. Thus, 
there was a total of seven stimuli including three 
word signs, three symbol signs, and one blank. 

All stimuli were presented against a white back- 
ground on a Dodge tachistoscope in a completely 
darkened room. Each stimulus was presented for 
.06 seconds with an interstimulus interval of 10 
seconds. There was a 30-second interval between 
trials. The intensity of illumination from the 
tachistoscope for the .06-second presentation time 
was a little less than four standard candles as 
measured by a Brockway photometer held 6 inches 
from the instrument. 
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Procedure and Instructions 


The stimuli were presented to small groups ranging 
in size from 4 to 11 people. All Ss were seated from 
4 to 7 feet from the tachistoscope and arranged so 
they had an unobstructed view of the front of the 
apparatus. Each person was given three sheets of 
paper on each of which was a word sign, for 
example, NO LEFT TURN, and its symbol, for example, 


\ . The specific instructions for this phase of the 


the experiment were: 


I have given you some drawings of road signs 
on these papers. Some are familiar to you, others 
are not. On each sheet, the words and the symbols 
mean the same thing, that is, the black arrow to 
the left with the black diagonal means no left 
turn, the black arrow to the right with the black 
diagonal means no right turn, and the black 
horizontal line means do not enter. Your task 
is to study these words and symbols so that you 
can recognize them when they are presented to 
you. 


After the Ss had studied the words and their cor- 
responding signs for 5 minutes, they were shown 
the actual set of stimuli that were to be used in 
the experiment. At this point, the experimenter (EZ) 
said: 


Here are the stimuli which will actually be used 
in the study so that you may become accustomed 
to the size of the words and symbols. Are there 
any questions? 


After questions were answered, the sign-symbol ex- 
amples were collected. All Ss were then given a 
14-page booklet of paper and told: 


Please print your name, age, sex, and whether 
you are a licensed driver or not in the upper right- 
hand corner of the booklet given to you. The 
stimuli will be presented one at a time behind 
this mirror so that you can see them. Make sure 
you have a clear view of the mirror. I will an- 
nounce a number for each stimulus. You are to 
write down what you see in the 10 seconds before 
the next stimulus appears. There will be seven 
stimuli presented and then a rest of 30 seconds 
followed by seven more stimuli. For the sake of 
simplicity you may use abbreviations for what you 
see. If you see an arrow symbol, simply make an 
arrow in the same direction as it appeared on the 
card. If you see a dash, just make a dash. If you 
see no left turn or no right turn, just write NL 
or NR. For the do not enter write DN. If you 
see nothing when the light flashes, then write 
nothing on the page. It-is essential that you turn 
a page in the booklet every time a light appears 
in the apparatus so that when the experiment is 
over you will have written your last answer 
on Page 14 of the booklet. 

Since this whole study will take place in the 

dark, I am now going to put out the lights and 
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TABLE. 1 


‘TOTAL CorRRnCT STIMULUS LDENTHMCATION 


Trial 2 


Trial 1 
N Possible Symbols Words Symbols Words 
Males 20 78 65 32 66 42 
Pemales AA 132 107 70) 121 86 
All subjects 70 210 170 102 187 128 


we will begin in a few minutes after you have 
had the opportunity to adapt to the darkness, Are 
there any questions? 


The Ss sat in the dark for 4 minutes and then 
the experimenter announced that the stimuli would 
begin in 1 minute, The / again warned the Ss of 


the onset of the first stimulus 10 seconds before 
it appeared. He announced the presentation of 
each stimulus from one through seven and then 


allowed a 30-second rest. Following the brief rest, 
the # presented the last seven stimuli, Each set of 
seven stimuli constituted a trial and consisted of 
the three word signs, the three symbols, and a blank 
card, The stimuli were presented in a random order 
for both trials and for all groups, 

The procedure and instructions for the second 
stage, or chromatic condition, were essentially the 
same as those used in the first stage of the experi- 
ment, 


Retention Test 

Twenty-seven of the Ss were presented with the 
international symbols approximately 24 hours after 
the experiment proper to determine whether they 


had learned the meaning of the symbols, The in- 
structions to them were: 


Yesterday you participated in an experiment 
during which you studied these three symbols 
showing the symbols) and their meaning as 
traffic signs. Today, I would like for you to recall 
the meaning that you learned for these symbols. 
Simply write down what you remember in the 
spaces provided on these pieces of paper that I 
will give you. 


RESULTS 


A classification of the number of correct 
stimulus identifications by sex of Ss, by trials, 
and by type of stimuli used is presented in 
Table 1. Inspection ,of this data revealed 
that on both trials the international symbols 
were correctly identified more frequently than 
were the American word signs. A nonpara- 
metric statistical test, the Wilcoxon matched- 
pairs signed-rank test, was applied to the data 


to determine the level of statistical signifi- 
cance of the observed differences between the 
symbols and words, ‘The results of this analy- 
sis (see Table 2) indicated that for both 
trials and both sexes the differences between 
the symbol and word detection were all sig- 
nificant. 

The 11 males who were exposed to the red- 
black symbols and black-word signs also cor- 
rectly identified more of the former. The 7 
for Trial 1 was seven (NV = 10, p< .05); 
that for Trial 2 was zero (V = 6, p= .05). 

ach of the 27 Ss who participated in the 
test for retention of the meaning associated 
with the three symbols used in this investiga- 
tion had perfect recall. Thus the group’s 
retention was 100% accurate. 


TABLE 2 


Witcoxon Matrcnnp=PArs SiGNnD-RANK 
Trs’s or Sympon AND Worp-SIan 
Tbe NTINICATIONS 





Ne va 

Males 

Trial 1 19 5.0" 

Trial 2 18 13,5" 
lemales 

Trial 1 32 60,0" 

Trial 2 28 18.0* 
All subjects 

Trial 1 51 99.0* 

Trial 2 46 62,5" 








"Na do not correspond to aample salve, aw the Wileoxon teat 
does not utilize data from subjects with no differences in their 
performances from condition to condition, for example, in this 
experiment a subject who correetly identified as many word 
wane asuymbols would not contribute to the data to be analyzed. 

‘In thia atudy 7 was always the sum of the ranks for the 
word signa, as the symbol signa were consistently identified 
more frequently than were the word signa, 

*y S$ .01, 
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DISCUSSION 


The results of both stages of this experi- 
ment strongly supported the hypothesis that 
symbols can be recognized significantly more 
accurately than word signs. These results 
were especially striking when the following 
factors are taken into consideration: (a) fa- 
miliarity was a positive factor in favor of the 
word signs, as all of the Ss had lived in 
America for at least 17 years and 77% of 
them were licensed drivers; (b) the particu- 
lar symbols chosen for the experiment have 
been demonstrated to be the most difficult in 
the international system for American Ss to 
learn and to interpret (Brainard et al., 1961); 
and (c) in stage one the symbol stimuli were 
all black instead of the usual red-white or 
red-white-black, so that color cues would not 
be a positive factor in discriminating the 
symbols from the typical American black- 
word signs. A plausible’ explanation for the 
superiority of the symbol sign would be the 
greater perceptual simplicity of the symbol. 
The symbol is more visually integrated 
‘whereas the letters of the word signs are more 
fragmented. 

It seems that the three basic criteria that 
should be used for any system of road signs 
are the speed and accuracy of recognition and 
the meaningfulness of the signs that consti- 
tute that system. This present investigation 
has given some indication that symbols can be 
recognized with significantly greater accuracy 
than word signs. In addition, the finding that 
a group of Ss can remember the meaning of 
the set of symbols with 100% accuracy after 
a 24-hour delay was quite impressive. The 
work of Brainard and his associates (1961) 
was supported to some extent by the latter 
finding and definitely emphasized the ease 
with which the meaning of the international 
symbols can be learned. Admittedly there are 
several limitations in generalizing from results 
obtained by studying rather select Ss func- 
tioning under rather prescribed conditions, 
but the results in both this study and in 
Brainard et al. do seem to indicate that fur- 
ther research contrasting symbol systems and 
word systems is warranted. Such future re- 
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search should undertake to include: more 
signs in both systems, samples of other popu- 
lations, other color schemes, simulated day- 
and night-driving conditions, and speed/ 
accuracy relationships. Of particular interest 
would be the extension of this research into 
the broad field of industrial safety. Should 
similar results be found in relation to factory, 
machinery, and various work situations, the 
implications would be significant to manage- 
ment and workers. 

Even if future research were to bear out 
the desirability of changing the American 
road-sign system from words to symbols, 
several cogent reasons have been advanced 
against such a change—such as the cost of 
replacing the existing system and the cost of 
reeducating the driving public (Eliot, 1960). 
The aforementioned objections to the change 
of systems would have to be weighed against 
many sound reasons in its favor. First, signs 
that are easier to recognize and interpret 
would lead both directly and indirectly to 
safety on the road for pedestrian and motor- 
ist, as many unnecessary mistakes and con- 
fusions could be avoided. Also, if the symbol 
signs would require less time to identify, they 
would minimize the time the driver’s vision 
is diverted from the roadway. Secondly, a 
symbol system would circumvent a language 
problem for those who do not read English. 
This latter reason would be especially rele- 
vant for international visitors and members 
of certain foreign-speaking American subcul- 
tures. Conversely, a further outcome would 
be the familiarization of the American driving 
public with the international system. Thus, 
increasing numbers of American tourists would 
be better prepared for driving on foreign 
roads. 
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Set and content scores from 3 MMPI scales, Edwards SD scale, the Manifest 
Anxiety scale (MA), and the Masculinity-Femininity (Mf) scale, were derived 
by an adaptation of the Helmstadter technique for obtaining separate (acquies- 
cence) set and content scores from personality scales. In a factor analysis of 
scores for 150 male college Ss on 54 variables, the MA-Set and SD-Set variables 
defined a common factor, but only the Mf-Set variable loaded the second, or 
acquiescence, factor. The inconsistency of these results indicated that the set 
formula was not consistently measuring, or reflecting, acquiescence, or any other 
construct, and furthermore suggested the need for caution in making acquies- 
cence interpretations based on the Helmstadter procedure. Some speculations 
were advanced to account for the disparate results of the set variables, such as 
the degree of true-false and SD-SUD keying in the “parent” scales. A systematic 
variation of such scale keying in future research may indicate what the set 
procedure is measuring and have potential implications for the clarification of 


the nature of acquiescence in personality scales. 


Concern for the existence, influence, and 
measurement of response sets or styles in 
personality scales continues to be reflected in 
personality research. Of the two major re- 
sponse sets usually identified in personality 
measures, the acquiescence response set, or 
the tendency to mark True to a personality 
item, has given indications of being elusive 
of specification. For example, Edwards and 
Diers (1963) observe that no one of the 
proposed measures of acquiescence in the 
Minnesota Multiphasic Personality Inventory 
(MMPI) is independent of social desirability 
influences while other evidence indicates a 
lack of convergent validity among various 
measures of acquiescence (Edwards, 1963; 
Foster & Grigg, 1963; Husek, 1961; McGee, 
1962; Schutz & Foster, 1963). Thus, the 
general problem in response-set research re- 
mains that of clarifying the nature of response 
acquiescence. 


1 The research in this paper was sponsored in part 
by the 6570th Personnel Research Laboratory, Aero- 
space Medical Division, under AFSC Project 7719 
(02). This investigation was conceived while the 
senior author held a Postdoctoral Research Fellow- 
ship from the National Institutes of Health, United 
States Public Health Service. 
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One approach to the assessment of acqui-— 
escence and its influence in scales has been 
the Helmstadter (1957) procedure for ob- 
taining separate content and set scores from 
a particular measure. Although Helmstadter 
presented several methods for deriving the 
content and set aspects of a particular scale, 
the general procedure can be represented by 
the formulas below: 


Content = Ra/N,4 + Rp/Np 
Set = RAY 4 = Ry/Nx 


Nx, and Ng represent the number of items 
keyed True and False, respectively, while Ry 
refers to the number of true-keyed items an- 
swered true and Rx refers to the number of 
false-keyed items answered false. Messick 
(1961) presented a general statement on the 
application of the Helmstadter technique to 
personality and attitude scales and the con- 
tent and set formulas from that article, sub- 
sequently used in the present research, differ 
from those above in (a) the subtraction of 
unity in the content formula, and (6) the 
inclusion of a denominator term, 1 — |C], in 
the set formula where C refers to the content 
score of a specific individual. 


SET AND CONTENT SCORES 


A number of studies have reported applica- 
tions of the Helmstadter procedure or of con- 
tent and set “scales” derived by that pro- 
cedure (Adams & Kirby, 1963; Clayton & 
Jackson, 1961; Frederiksen & Messick, 1959; 
Messick & Frederiksen, 1958; Wiggins, 1964). 
In deriving set scores from various scales, the 
assumption seems to have been made that 
these set scores were measuring the same 
thing, namely, response acquiescence. Re- 
cently, Adams & Kirby (1963) used the 
Helmstadter procedure, with original and re- 
versed items, to assess the degree of acquies- 
cence influence in the Edwards (1957) Social 
Desirability (SD) scale and the Taylor 
(1953) Manifest Anxiety (MA) scale. It 
seems appropriate to investigate whether vari- 
ous set variables are in fact measuring the 
same thing. If so, they could be expected to 
intercorrelate highly. Furthermore, if the set 
formula yields measures of acquiescence and 
the second largest factor in the MMPI re- 
flects acquiescence, then the various set vari- 
ables could be expected to load that factor. 
The present study investigates the relation- 
ship between set scores derived from three 
MMPI scales and the response-set factors 
frequently found in the MMPI, particularly 
the second, or acquiescence, factor. 


MetHop 


The three scales for which separate content and 
set scores were derived, the SD scale, the MA scale, 
and the Masculinity-femininity (Mf) scale, repre- 
sent a cross-section of MMPI scales in their true- 
false and socially desirable (SD) and socially un- 
desirable (SUD) keying. The SD scale contains a 
arge number of false-keyed items (77%), the MA 
scale mainly true-keyed items (76%), while the Mf 
scale shows considerable balance in true-false keying 
‘47% keyed true). In SD-SUD keying, all 39 items 
of the SD scale are keyed for socially desirable re- 
jponses and 49 out of the 50 MA items are keyed 
n the SUD direction. The Mf scale again shows 
sonsiderable balance with 28 of 60 items keyed in 
he socially desirable direction, In making determina- 
ions of the SD-SUD keying of items, the social 
lesirability scale values (SDSV) derived by Messick 
ind Jackson (1961b) were used. All items beyond 
he midpoint, 5.0, in the socially desirable direction 
m the nine-point social desirability continuum were 
‘onsidered to have socially desirable scale values. 
Those items having SDSVs of less than 5.0 were 
onsidered to have socially undesirable scale values. 

Besides considerations of degrees of SD-SUD and 
rue-false keying, none of the three scales showed 
mn appreciable loading on the second, or acquies- 
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cence, factor, in a prior study (Liberty, Lunneborg, 
& Atkinson, 1964). This procedure seems likely to 
preclude the possibility of a loading by a set variable 
on the second factor as a consequence of the rela- 
tionship between its parent scale and that factor. 

MMPI profiles of 150 male college students, avail- 
able from a previous study (Liberty et al., 1964) 
were used in the present study. True and false items 
of the SD scale, the MA scale, and Mf scale were 
scored separately, along with the usually obtained 
full-scale scores. The revised Helmstadter technique 
formulas (Messick, 1961) were applied to obtain 
separate set and content scores for each subject (S) 
on each of the three scales.2 In addition to the three 
content and three set scores, 46 MMPI scales and 
2 non-MMPI scales, the Couch and Keniston (1960) 
Agreement Response Set (ARS) scale and the 
Crowne and Marlowe (1960) Social Desirability 
(M-C SD) scale, were included in the analysis.® 
References to the MMPI scales are found in Dahl- 
strom and Welsh (1960). The 54 X 54 correlation 
matrix was submitted to a principal axes, unit di- 
agonals, factor analysis. Nine factors with latent 
roots greater than unity were extracted and rotated 
orthogonally to the varimax criterion using a com- 
puter program by Veldman (1963). The nine rotated 
factors, accounting for about 74% of the total vari- 
ance, are presented in Table 1. The first five factors, 
accounting for 54% of the total variance and 73% 
of the extracted variance, are of particular interest to 
this study. No attempt will be made here to discuss 
the remaining factors which are quite similar to those 
previously identified (cf. Liberty et al., 1964) in 
factor studies of the MMPI. 


2 The set scores derived by the two formulas, based 
on slightly different rationales, correlated .73 (SD), 
91 (MA), and .95 (Mf). A factor analysis identical 
to the one reported in this study used set variables 
derived by the original Helmstadter set formula, The 
results of this second factor analysis were virtually 
identical to those reported here. 

8 Scales included, for which abbreviations are given 
in Table 1, are: 1-Achievement via independence, 2- 
Welsh Anxiety, 3-Academic achievement, 4-Manifest 
anxiety, 5-Fricke Response bias, 6-Caudality, 7-Con- 
trol, 8-Depression, 9-Dominance, 10-Dissimulation 
(rev.), 11-Dependency, 12-Ego overcontrol, 13-Ego 
strength, 14-Validity scale, 15-Female masochism, 
16-Hostility, 17-Hypochondriasis, 18-Conversion hys- 
teria, 19-Intellectual efficiency, 20-Impulsivity, 21- 
Correction, 22-Lie, 23-Leadership, 24-Hypomania, 
25-Masculinity-femininity, male, 26-Positive malin- 
gering, 27-Neuroticism, 28-Neurotic overcontrol, 29- 
Neurotic undercontrol, 30-Originality, 31-Paranoia, 
32-Psychopathic deviate, 33-Prejudice, 34-Psychas- 
thenia, 35-Pharisaic virtue, 36-Welsh repression, 37- 
Social responsibility, 38-Role playing, 39-Schizo- 
phrenia, 40-Social introversion, 41-Edwards Social 
Desirability, 42-Social participation, 43-Social status, 
44-Tolerance, 45-Hanley Test-taking defensiveness, 
46-Wiggins Social desirability, 47-Couch-Keniston 
Agreement response scale, 48-Marlowe-Crowne So- 
cial desirability. Variables 49-54 are specifically de- 
scribed in the text. 
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TABLE 1 


ROTATED FACTOR LOADINGS 


















































Variable if Il Til IV V VI Vil VIII IX i? 
1. Aa 2D oD alk —.13 —.12 .02 —.69 .09 00 73 
Dime —.78 — 43 el'5 —.10 —.01 —.07 07 00 10 85 
SrA G A4 ws —.22 —.08 09 —.09 — 48 .06 — .36 68 
4, MA —.79 —.27 24 —.09 —.01 —.02 16 —.09 AS 82 
5. B — 36 —.72 .06 nD) .00 08 pil —.15 it 72 
6. Ca —.78 —.21 19 — .08 —.11 —.07 ea — .03 ae .78 
7. Cn —.21 — 40 55) —.31 14 —.09 .09 —.15 14 68 
Qe ID — .68 27 09 —.21 —.05 —.18 i 05 eed nD 
9. Do .60 akS .02 — .04 07 .02 — 40 — .09 2S) 61 
10. Ds-r —.59 —.39 .20 —.08 .00 — .09 28 — .26 30 79 
11. Dy — 83 — 34 10 —.15 —.07 13 08 02 .03 86 
ee o — .06 13 —.15 18 ml —.13 —.20 — .03 —.15 .69 
IS VES .66 —.04 —.01 —.10 .06 a —.14 — .08 07 A9 
14. F — 48 — 13 14 —.05 —.08 05 il 03 .60 yl 
15. Fm —.81 — 10 —.07 —.01 06 —.09 --.07 ts — .03 cea 
16. Ho — 39 — 48 10 —.03 01 08 oil —.56 18 88 
VERAELS, —.59 — .06 18 —.15 —.25 —.31 aD — .03 .20 70 
18. Hy —17 27 04. — 23 —.21 — .66 05 36 24 82 
19. le As ALS) 10 .06 08 —.06 — .69 00 —.08 IS 
20. Im —.28 — 54 A2 —.01 08 —.16 [33 — 14 29 719 
Des O7 30 —.25 00 — .06 —.15 — 24 34 .02 83 
Dias 10 DA —.74 —.19 — .06 els 00 10 .02 .68 
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SET AND CONTENT SCORES 


RESULTS AND DISCUSSION 


The first three factors in Table 1 are identi- 
fiable as social desirability, acquiescence, and 
lie or social desirability role-playing. The first 
factor is marked by the Edwards (1957) SD 
scale and the second factor by the Welsh 
(1956) R scale. The third factor has its high- 
est loading by the Wiggins (1959) Social de- 
sirability (Sd) scale with other high loadings 
by the Lie scale, the Cofer, Chance, and Jud- 
son (1949) Positive malingering (Mp) scale, 
and the M-C SD scale. Support for the identi- 
fication of these factors as response sets is 
found in a number of studies (Edwards, Diers, 
& Walker, 1962; Edwards & Heathers, 1962; 
Edwards & Walker, 1961; Finney, 1961: 
Jackson & Messick, 1958, 1961, 1962; Liberty 
et al., 1964; Messick & Jackson, 1961a; Wig- 
gins, 1964). As expected, the SD-Content and 
MA-Content “scales” followed their corre- 
sponding full scales in loading the first, or 
social desirability, factor. 

The fourth factor is identifiable as mas- 
culinity-femininity from the loadings of the 
Mf scale and Mf-Content variable. The fifth 
factor is seen as a “set” factor with loadings 
by the SD-Set and MA-Set variables. How- 
ever, Mf-Set does not load this factor ap- 
preciably, showing instead a sizeable loading 
(—.72) on the second, or acquiescence, factor. 
Mf-Set correlates .05 with SD-Set and .07 
with MA-Set, while the latter variables cor- 
‘elate .40.* Obviously, the set variables from 
che Helmstadter technique are not measuring 
the same thing. The immediate conclusion 
would seem to be that the Helmstadter-set 
srocedure does not consistently indicate a 
iniform stylistic tendency and that there is a 
1eed for caution in making acquiescence inter- 
sretations based upon this procedure. 

Although SD-Set and MA-Set are not re- 
ated to the second factor, Mf-Set is associated 
vith that factor, and correlates (—.57) with 
he R scale. In trying to account for these 

*The correlation between MA-Set and SD-Set is 
indoubtedly due in part to the 19-item overlap in 
hese two scales. It remains to be seen whether cor- 
elations between set variables will be this high 
vhen item overlap is controlled. The correlations 
vith the Mf-Set variable are considerably lower and 

tem overlap is virtually nonexistent between the 


4f scale and MA (one item) and the SD scale (no 
‘ems). 
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TABLE 2 


Mf “Sca.Es,” MeprIan SDSVs, aNnp THEIR 
CORRELATIONS WITH SCALES MARKING 
THREE RESPONSE STYLE FACTORS 











SD:1 R:II Sd:101 
Mf scale (5.37) tS =02 eas 
Mf-Set ey = 57 —.06 
Mf-True (5.23) ait3 7 — 36 S11 
Mf-False (5.60) 07 51 238 





results, the degree of item overlap between 
the 40-item R scale and the 60-item Mf scale 
was considered. Seven items are common to 
the two scales, four items of the Mf scale 
being keyed False in the direction of R scale 
keying and three items being keyed True 
which is opposite of R scale keying. Thus, it 
appears unlikely that item overlap accounts 
for this relationship, We note also that the 
Mf-Set variable differs from the other two set 
variables in this study in being derived from 
a scale which contains a considerable degree 
of balance in SD-SUD and true-false keying. 
Subsequent research may indicate what the 
Helmstadter set formula measures when scales 
with varying combinations of keying are sys- 
tematically investigated. But the possibility 
exists that balanced keying may merely op- 
erate to reduce the apparent influence of re- 
sponse styles. When such scales are partitioned 
into true and false components (Jackson & 
Messick, 1962; Messick, 1962) or examined 
on a SD-SUD basis (Gocka, 1962), the scales 
may not prove to be independent of the influ- 
ence of stylistic tendencies. Perhaps, the Helm- 
stadter-set procedure operates similarly in dis- 
closing stylistic influence in the Mf scale. 
That this may be the case is seen in Table 2, 
which shows various components of the Mf 
scale and their correlations with scales defin- 
ing the three response set factors. Mf-Set cor- 
relates higher with the SD and R scales than 
does the standard Mf scale. 

Recently, Edwards and Diers (1963) pre- 
sented data suggesting that the number of 
True responses to a set of items with scale 
values falling at about 5.8 on the social de- 
sirability continuum would be relatively un- 
correlated with the SD scale since at that 
point on the continuum the probability of a 
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True response is about equal for high- and 
low-scoring Ss on the SD scale. It has not 
yet been demonstrated whether a scale com- 
posed of approximately 5.8 items measures 
acquiescence, but it may be worthwhile to 
consider the median SDSVs of the Mf scale 
and the Mf-True and Mf-False subscales. 
Considering the item SDSV range in the Mf 
scale (1.11-7.50), the median, rather than the 
mean, SDSV may be a better estimate of the 
SD characteristics of items in these measures. 
From Table 2, the Mf-False subscale shows 
the lowest correlation with the SD scale and 
the highest correlation with the R scale. Its 
median SDSV (5.6) also approaches the 5.8 
point suggested by Edwards and Diers (1963) 
and, interestingly, the probability of a True 
response in the Mf-False subscale is .46, rea- 
sonably close to the expected .50 at 5.8 on 
the SD continuum. These results indicate sup- 
port for the Edwards and Diers finding and 
also that acquiescent tendencies may be maxi- 
mally operative at about 5.8 on the SD con- 
tinuum. 

Obviously, only speculations have been pre- 
sented in trying to account for the diverse re- 
sults of the Helmstadter set score technique 
with the three scales in this study. Additional 
research with other scales may determine 
what the set procedure measures under vary- 
ing conditions of scale keying. Such research 
may even potentially cast light not only on 
what the set formula measures but may con- 
tribute useful information toward the under- 
standing of the nature of response acquies- 
cence in personality scales. Prior to such work, 
however, the present results indicate that 
“acquiescence” is not consistently assessed 
by the Helmstadter set procedure and that 
interpretations of acquiescence influence in 
scales based on that procedure are at best 
premature. 
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LEARNING OF PROSE WRITTEN IN FOUR 
GRAMMATICAL TRANSFORMATIONS * 
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New Mexico State University 


4 comparisons of pairs of grammatical transformations are reported. Active-verb 
transformations were found easier to learn than their nominalizations (p < .001), 
actives were easier to learn than their passives (p <.01), nonembedded sen- 
tences were easier to learn than their embedded counterparts (p < .05), and no 
significant difference was found between adjectivalizations and their counter- 
parts using adjectives. 10 different categories of active-verb sentences and their 
nominalizations were examined and by determining which categories of nomi- 
nalizations were responsible for deleterious effects, several rules for improving 
readability were reexpressed in terms of grammatical transformations. The data 
were also used to examine the extent to which complex sentences are recoded 


and stored in memory as kernels. 


Several experiments (Coleman, 1964a; Cole- 
man & Blumenfeld, 1963) using a variety of 
dependent variables have shown that some 
grammatical transformations of a sentence 
are more easily comprehended than others. 
Much of the time when a writer applies rules 
for improving readability (e.g., those of Flesch, 
1946), he is actually choosing one such trans- 
formation above another, and thus these rules 
could be stated more precisely for him in 
terms of grammatical transformations. The 
practicing writer can verify this by noting the 
number of times that he makes grammatical 
transformations as he revises a passage to 
make it more comprehensible. As an example, 
if he tries to make one of the nominalized 
sentences in Table 1 more readable by increas- 
ing the number of personal words, decreasing 
clause length, or by applying almost any of 
the rules for improving readability, he will 
discover that the change he frequently makes 
is a grammatical transformation that operates 
upon entire clauses and alters many such 
variables simultaneously. In short, when one 
considers the actual operations a writer per- 
forms to improve readability, there are com- 
pelling arguments for believing that gram- 
matical transformations are fundamental units 
and that describing some rules in terms of 
smaller units is more or less artificial. 

Two previous experiments (Coleman, 1964a, 
Experiments III and IV) drew a random 
sample of nominalized sentences and trans- 


1 This research was supported by Grant GB-241 
from the National Science Foundation. 
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formed them to simpler forms using active 
verbs. Sentences were considered a sampling 
variable (Coleman, 1964b), and significance 
tests showed that the improvement in com- 
prehensibility could be generalized to a popu- 
lation of nominalized sentences. But even 
though nominalizations in general are less 
easily understood than certain other gram- 
matical transformations, it is unreasonable to 
expect each individual nominalization to be 
poor writing. 

Nominalizations certainly serve useful func- 
tions in language, and thus it would be helpful 
to examine different categories of nominaliza- 
tions and attempt to discover which cate- 
gories are responsible for deleterious effects. 
Other transformations should also be ex- 
amined in detail. Perhaps grammatical trans- 
formations that are difficult to understand 
can be grouped into categories which can be 
described in terms that will provide new in- 
sights into readability. 

Such data will not only be useful to those 
interested in improving readability; less di- 
rectly they will be useful in formulating a 
psychological theory of how sentences are 
understood. For instance, data from the fol- 
lowing experiments will be used to examine 
Miller’s. (1962) suggestion that complex sen- 
tences are stored in memory as their under- 
lying kernel sentences. 

The first experiment continues the exami- 
nation of nominalizations, and the second 
compares actives to passives. The third be- 
gins the examination of adjectivalizations, 
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and the fourth compares embedded sentences 
to nonembedded. 


EXPERIMENT I: NOMINALIZATIONS 


Two previous experiments showed that ran- 
domly drawn samples of nominalizations were 
generally less comprehensible than their de- 
transformed versions using active verbs (Cole- 
man, 1964a, Experiments III and IV). The 
following experiment is a more fine-grained 
study of nominalizations, being actually 10 
separate experimental comparisons: It com- 
pares 10 different kinds of nominalized sen- 
tences to their detransformed versions using 
active verbs, and each kind of nominalization 
is represented by a sample of several sen- 
tences. 


Method 


Design. The study can be described most simply if 
it is considered as 10 separate experiments—each 
comparing one of the nominalizations of Table 1 
with its simplified grammatical transformation using 
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the active verb (each subject [S] participated in 5 
of the 10 experiments). Each experiment used a 
Latin square to counterbalance differences in Ss and 
sentences. The essential point is that each sentence 
was written in both versions (nominalized and sim- 
plified), and each S read half his sentences in the 
nominalized and half in simplified style. Thus differ- 
ence between transformations is a within-sentences 
as well as a within-Ss effect. 

Subjects. The Ss were 60 undergraduates enrolled 
in psychology classes at Sul Ross State College. 

Materials. Each of the 10 grammatical transforma~ 
tions in Table 1 was represented by a sample of 10 
different sentences. Although the sentences represent 
passable English it should be emphasized that the 
experimenter (Z) composed them himself. Actually 
drawing a sample from a library would be better of 
course, but the cost was prohibitive. Each sentence 
was typed with four words to a line on a separate 
tape for memory-drum presentation. 

Formulas and examples of nominalizations and 
their grammatical transformations using active verbs 
are given in Table 1. Essentially, a nominalization 
contains a noun derived from a verb, for example, 
operation, publication, judgment, etc. Lees (1960, 
especially Chapter 3) should be consulted for a more 
extensive description. 


TABLE 1 


MEAN TRIALS TO CRITERION FOR NOMINALIZED SENTENCES AND THEIR DETRANSFORMATIONS 











would be warm. 





Nominalization Trial Detransformation: Active-verb version Trial 
Vtion of N is A Conj.N2VN,itisA _ 
A detailed knowledge of the lower Mississippi valley 4.8 | If you knew the lower Mississippi valley in detail, it 4.1 
would be quite helpful. would be quite helpful. 
N’s Vtion V2N2 Conj. N V, it V2 N2 ; 
The policeman’s investigation of the incident involved 4.7 | When the policeman investigated the incident, it 3.6 
hours of tracing down suspects. involved hours of tracing down suspects. 
Vtion filler-verb Votion Conj. N V, N V2 
An investigation of the circumstances will require real 4,2 | If he investigates the circumstances, he must really 33 
concentration on the situation. concentrate on the situation. 
N's Vtion is A Conj. N V, it is A 
His discussion of the reason for the decision will be 4.1 | If he discusses the reason for the decision, it will be Dot 
appreciated. appreciated. 
Vtion of N is Vetion of Ne Conj. N3 V N, Ns V2 No . 
Their inclusion of this provision is admission of the 5.3 | When they included this provision, they admitted the 3.9 
importance of the system. importance of the system. 
N filler-verb Vtion (conj. Ne filler-verb Vetion) N V (conj-N2V2) E 
He took a walk across town while they were engaged in 3.9 | He walked across town while they were arguing. 3.0 
the argument. 
N’s Vtion is (not) possible N can (not) V h 
The incumbent's election is possible only if the entire 6.0 | The incumbent can be elected only if the entire reform 4,3 
teform party can be discredited. party can be discredited. 
N V prep A Vetion of N2 N V conj. the A way N2 V2 . 
I was intimidated by the bold smile of the girl. 2.7 | I was intimidated by the bold way the girl smiled. 3.0 
N V N2e's Vetion prep Na N V conj. Nz V2 Na ‘ ’ 
aos the older detective’s careful examination of 4.6 | I noticed the older detective carefully examine the rock. 3.7 
the rock. 
N’s Vtion is A N V Aly : 
It was apparent that the president’s welcome to John a5 It was apparent that the president would welcome John 3.7 


warmly. 


ee Se ee ee ee ee ee eee eee 


Note.—N = noun, V = verb, tion = any nominalizing morpheme, A = adjective, Aly rb, i 1 
= conjunction, Formulas were simplified by omitting details such as modifiers, auxiliaries, tense, and the like, 


= preposition, conj 


= adverb, is = any copula, prep 


for example, the first formula only describes the words knowledge of Mississippi is helpful. Some phrases were omitted from the 


examples to reduce the size of the table. 
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Presentation. Each S was tested individually. He 
was presented with five of the nominalizations of 
Table 1 and five simplified transformations in a 
NSSN order (he saw only one transformation of 
any pair of sentences). Since there were 200 different 
sentences—100 nominalizations and their 100 de- 
transformations—each sentence was learned by three 
Ss. Sentences were presented on a memory drum at a 
1-second rate, and since there were four words to a 
line, this means that they were presented at the rate 
of 240 words per minute. The S was asked to mem- 
orize the sentence, and immediately after seeing it, he 
was asked to repeat it. It was presented again until 
he could repeat it perfectly. The measure was trials 
to first perfect repetition. 


Results 


Mean trials to one perfect repetition of a 
sentence are given in Table 1. For six pairs 
of transformations in Table 1, the detrans- 
formed active-verb versions were easier to 
learn (significant at at least the .05 level by 
a Wilcoxon matched-pairs test, e.g., for the 
least reliable of these differences—that of the 
second pair—T was 98 for N of 25). There 
were no significant differences for four pairs 
—the first and last three pairs. Some new in- 
sights into readability may be provided by 
dividing these nominalizations into ones that 
are relatively difficult to learn versus those 
that are not. 

In the first five pairs in Table 1, the nomi- 
nalizations are one-clause sentences, which 
when detransformed, result in a sentence con- 
taining two coordinate clauses. In four of 
these pairs, the detransformed two-clause ver- 
sion was significantly easier to learn, and the 
difference for the other pair (the first one in 
Table 1) missed significance by only a nar- 
row margin. Thus it seems reasonable to as- 
sume that a person can learn a set of content 
morphemes packaged into two clauses more 
easily than he can learn the identical set 
packaged as a single clause. This assumption 
is in line with suggestions that a writer can 
improve readability by shortening sentences; 
for as a matter of fact, a careful reading of 
most such suggestions (e.g., Flesch, 1946, p. 
32; 1949, p. 129) will show that the sugges- 
tions are more concerned with clause length 
than with sentence length. 

The next two nominalizations in Table 1— 
the sixth and seventh—have detransforma- 
tions containing fewer words (e.g., He took 
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a walk across town — He walked across town 
and The election of John is possible — John 
can be elected), and it is hardly surprising to 
find a short list of words easier to memorize 
than a longer one. 

The last three nominalizations were not sig- 
nificantly more difficult to learn than their 
detransformations. Two of these detransfor- 
mations changed a prepositional phrase to a 
subordinate clause and one (the last one) did 
little more than rearrange the words. A dis- 
couragingly consistent failure to find reliable 
improvements when the detransformations do 
not shorten clause or sentence length has 
dampened the author’s enthusiasm for the 
notion that was originally responsible for 
studying grammatical transformations. This 
notion has been expressed in several forms, 
one being that of Malinowski (1923), and it 
hypothesizes that words are stored in psycho- 
logical categories, specifically for instance, 
words that denote “actions” are stored in the 
mind as verbs so that words such as opera- 
tion, publication, judgment, etc. are inher- 
ently more difficult to process than operate, 
publish, judge. Similarly, words that describe 
a “quality of a thing”—beauty, wisdom, ar- 
rogance—are stored as adjectives and would 
be more difficult to process than beautiful, 
wise, arrogant. Even if such psychological 
categories exist, using a word in other than 
its psychological category seems to have only 
negligible effects. (Note also the failure to 
find significant differences in Experiment III.) 

The examples in Table 1 may suggest more 
precise interpretation of several rules for im- 
proving readability such as using short sen- 
tences, many verbs, many “personal words,” 
short words, and familiar words. It has al- 
ready been remarked that the suggestion to 
use short sentences might be better phrased 
as a suggestion to use short clauses, and the 
first five examples in Table 1 show that de- 
transforming nominalizations is an effective 
way to shorten clauses. When a writer de- 
transforms a nominalization, he automatically 
increases the number of verbs in the passage. 
The verb usually requires a subject, and this 
increases the number of personal words (pro- 
nouns and names). And finally the verb form 
of a word is usually shorter and more com- 
monly used than its nominalized form. 
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In fact, it is probable that many rules of 
readability were successful because they led 
writers to prefer easily comprehended, active- 
verb transformations. When writers applied 
the rules in ways that did not influence the 
transformational structure, they probably had 
little effect. For example, if a writer is trying 
to compose sentences with many “personal 
words,” he will tend to prefer active-verb 
transformations so he can use personal words 
as subjects and objects. This will improve 
readability. On the other hand, when his sen- 
tences already consist of active-verb clauses, 
he will probably not get additional improve- 
ments by inserting additional personal words. 
Similarly, if he is trying to write short sen- 
tences, he will tend to prefer active-verb 
clauses because they are short and can be 
broken into short sentences. This too will im- 
prove readability. But when his sentences al- 
ready consist of active-verb clauses, he will 
not get additional improvements by inserting 
periods to break these short clauses into short 
sentences. 


EXPERIMENT II: ACTIVES VERSUS PASSIVES 


A previous experiment showed that a long 
passage was more easily comprehended after 
three transformations were applied to it, one 
of the three being detransforming passive sen- 
tences to actives (Coleman, 1964a, Experi- 
ment I). In addition Anderson (1963) has 
compared the short-term retention of actives 
and passives but he found no significant dif- 
ferences, at least not when his measure was 
discrete content words. The following experi- 
ment is essentially a replication of Anderson’s, 
but by capitalizing upon his findings it man- 
ages a slightly more sensitive design and 
method of presentation. Anderson also ana- 
lyzed his data in terms of sequential con- 
straint, and this analysis indicated that ac- 
tives would be better retained than passives 
if retention were scored in longer units. In the 
following experiment retention will be scored 
in discrete content words,” in three-word sets 


2 More accurately, they were scored in content 
morphemes because any derived or inflected form of 
a word was counted correct. As a first approxima- 
lion, the reader can interpret content morphemes by 
contrasting them to function morphemes. Function 
morphemes are the words that are not capitalized in 
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used correctly, and in complete sentences per- 
fectly retained. 


Method 


Design. A Latin square was used to balance dif- 
ferences in Ss and sentences so that differences in 
transformations was a within-sentences as well as a 
within-Ss effect. That is, each sentence was written 
in both transformations and each S read half his 
sentences in active and half in passive form. 

Sampling variables. The Ss were 40 psychology 
students fulfilling a course requirement. 

The sentences also represent a sampling variable. 
That is, the conclusions must be generalized beyond 
the specific sentences used in the experiment to a 
population of similar sentences. Therefore the sen- 
tences used must represent a meaningful population, 
and significance tests must be performed that will 
allow the conclusions to be generalized to that popu- 
lation (Coleman, 1964b).3 Ninety-six sentences were 
randomly drawn from the Sul Ross library—48 ac- 
tives and 48 passives. Index cards representing 96 
books were drawn, the books were opened to page 25, 
and the first passive sentence (or the first active sen- 
tence containing a transitive verb) was chosen. 
Modifiers were dropped and each sentence was writ- 
ten in both transformations, making 192 sentences, 
96 actives and their 96 passive transformations. A 
2 X 2-inch slide was prepared of each sentence. 

Presentation. The sentences were presented in sets 
of six by a slide projector, each sentence being 
flashed for 4 seconds. Each set of six contained three 
actives and three passives. There are 20 ways that a 





titles (prepositions, conjunctions, etc.) plus inflec- 
tional and derivational affixes such as -ed, -s, -tion, 
-ness, etc. Content morphemes are the roots of nouns, 
verbs, adjectives, and adverbs. Hockett (1958, p. 
264) has a more extensive discussion under the terms 
contentives and functors. 

3A similar observation is clearly pertinent to the 
conclusions of Experiment I. Conclusions restricted 
to the specific 200 sentences used in that experiment 
would not be of much interest. As a matter of fact, 
conclusions of Experiment I can be generalized to a 
population of nominalizations. For instance, the con- 
clusion that detransformations resulting in two- 
clause sentences are learned more easily than their 
one-clause nominalizations can be generalized. There 
were 50 matched pairs of such sentences (those rep- 
resenting the first five pairs of formulas in Table 1). 
Thirty-two of these pairs were learned in fewer 
trials in the detransformed, two-clause version, 12 
were learned in fewer trials in the nominalized ver- 
sion, and there were six ties. A ratio of 32 to 12 is 
significant beyond the .005 level by a binomial test, 
and thus this conclusion can be generalized to a 
population of such pairs of sentences. However, the 
sentences of Experiment I (since they were com- 
posed by the experimenter) do not represent as 
satisfactory a population as those of Experiment II 
and therefore the paper was not burdened with in- 
volved discussions of more or less unfamiliar tests, 
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set of three actives and three passives can be per- 
muted. Each S saw 16 sets, and these sets contained 
all possible permutations except AAAPPP, PPPAAA, 
APAPAP, and PAPAPA. Pertinent instructions were: 


You will see the sentences in sets of six. As soon as 
the projector goes off, write down all you can re- 
member. Write the sentences in any order but be 
sure to write everything you remember, even dis- 
crete words. A prize of $5.00 will be awarded for 
the highest score. 


After presenting each set, S was given 90 seconds to 
write down what he had remembered. This time was 
apparently adequate because S almost always stopped 
writing long before it was up. Immediately after re- 
sponding to one set, another set was presented. 
Scoring. The written responses were scored in three 
ways: (a) for total content words written in any 
order, (b) for three-word sets of content words cor- 
rectly used (e.g., if Bill hit the car was presented 
and S wrote Bill was hit by a car, he would get 
credit for three single content words but not for a 
three-word set because he used Bill and car incor- 
rectly), and (c) complete sentences perfectly re- 
tained. Any derived or inflected form of a content 
word was counted correct but synonyms were not 
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Fic. 1. Mean correct responses per subject per sen- 
tences for the three main content words arranged 
according to their semantic function in the sentence. 
(Note that responses for the same content mor- 
phemes are plotted in both active and passive sides 
of each bar.) 
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Results 


Actives were better retained than passives 
for all scoring systems. Mean units correct per 
S per sentence were: (a) 1.66 to 1.53 for 
number of correct content words, (6) .50 to 
.44 for three-word sets, and (c) .33 to .26 for 
complete sentences perfectly retained (actives 
always given first). The differences are sig- 
nificant for all scoring systems for both popu- 
lations, for example, the smallest proportional 
difference for the smallest sample—for dis- 
crete words for the S sample—is significant 
beyond .01 by a Wilcoxon matched-pairs test 
(T was 214 for N of 40). These conclusions 
disagree with those of Anderson (1963) who 
found no significant difference, at least not 
when he scored discrete words. His difference, 
however, was in the same direction as those of 
the present experiment. Also note that the 
proportional difference between transforma- 
tions becomes greater as responses are scored 
in larger units. A comparison of retention of 
approximations to English (Coleman, 1963) 
found a similar trend when responses were 
scored in progressively longer sequences. 
Taken together, the two experiments suggest 
that scoring in discrete words is an insensi- 
tive measure of the retention of connected 
discourse. 

It is also worthwhile to study which par- 
ticular words were remembered best in each 
transformation. Miller (1962) has suggested 
that all grammatical transformations of a sen- 
tence are stored in memory as kernels * with 
a sort of mental subscript denoting the trans- 
formation. The strongest statement of this 
notion would imply that no matter which 
transformation of a sentence is presented to 
S, he would remember its various words to 
the same relative degree. That is, if he re- 
members a group of passives by detransform- 
ing them to “kernels” and storing the kernels 
(or actives) in memory, then he should re- 
member essentially the same words as a 
matched S presented with the actives. This 
does not seem to be the case. 

Figure 1 plots the results for individual 
words so that the same words are represented 
in both the active and passive sides of each 


bar. In the sentences Jack hit the car and The 


4Jn this case, “kernels” can be interpreted as ac- 
tive, declarative sentences. 


ila 
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car was hit by Jack, for example, correct re- 
sponses for Jack are plotted in both active 
and passive sides of the bar labeled “actor.” 
If passives are remembered by detransforming 
them to actives and storing the actives in 
memory, then the profile of the passive bars 
would closely resemble that of the actives. Ap- 
parently it does not. Correlation (tau) be- 
tween the active and passive profile is only 
.33 for this figure. 

Actually, the profiles for actives and pas- 
sives resemble each other far more closely 
when the results are plotted according to the 
word’s position in the sentence as in Figure 2. 
Tau between active and passive profiles is 
1.00 for this figure. Regardless of which of the 
two transformations is presented to the S, he 
seems to remember the first content word 
best, the last one next best, and the middle 
one poorest of all. The finding that position 
is a better predictor than semantic function 
(i.e., function in the kernel) can be general- 
ized to both populations. A test for the popu- 
lation of Ss can be performed as follows: plot 
figures such as Figures 1 and 2 for each of 
the 40 Ss. Compute taus between active and 
passive profiles for both of his figures. Tau 
according to position was higher for 27 Ss, 
tau according to semantic function was higher 
for 9 Ss, and taus were identical for 4 Ss. A 
ratio of 27 to 9 is significant beyond .005 by 
the binomial test, and thus the conclusion 
can be generalized to the population of Ss at 
that level. If analogous taus are computed for 
the 96 sentences, tau according to position is 
higher for 59 sentences and tau according to 
semantic function is higher for 27 sentences. 
Since 59 to 27 is significant beyond .001, the 
conclusion can be generalized to the popula- 
tion of sentences at that level of confidence. 

Anderson (1963), who used a presentation 
closely resembling that of the present experi- 
ment, also found position to be an excellent 
predictor of retention. But rather than en- 
tirely abandoning Miller’s notion in favor of 
the serial position effect, it might be wiser to 
weaken it and say that there is a “tendency” 
to store complex sentences as kernels. Neither 
the present experiment nor Anderson’s meas- 
ured responses that closely resemble everyday 
verbal behavior: both experiments used short- 
term retention of unconnected sentences and 
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Fic. 2. Mean correct responses per subject per sen- 
tence for the three main content words arranged 
according to their position in the sentence. 


instructions that emphasized word-for-word 
memorization. If the responses we measured 
had more closely resembled everyday verbal 
behavior—say long-term retention of para- 
graphs—our results might have closely re- 
sembled those predicted by Miller’s notion. 

Actually, both Anderson’s experiment and 
the present one did reveal a tendency to re- 
code into kernels. They both found that sig- 
nificantly more passives were retained as ac- 
tives (or kernels) than vice versa. In the 
present experiment, 56 passives were retained 
as actives versus only 33 actives retained as 
passives (significant beyond .01 by binomial 
tests for both populations). 

One of the principal reasons a writer uses 
the passive voice is to emphasize the ‘object 
of the action” (the direct object in the ac- 
tive) by placing it first in the sentence. Fig- 
ure 1 suggests that this is not effective, and 
that the object of the action might be better 
retained as the direct object of the active ver- 
sion. The difference is not significant, how- 
ever, and in any case a writer can further em- 
phasize the object of the action by omitting 
the actor, that is, by omitting the preposi- 
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tional phrase. As a matter of fact, in their 
published versions 39 of the 48 passives sam- 
pled from the library for the present experi- 
ment had omitted the prepositional phrase, 
though it was supplied by the £& in the sen- 
tences presented to the S. 


EXPERIMENT III: ADJECTIVALIZATIONS 


A previous experiment showed that a long 
passage was more easily comprehended after 
it had been simplified by applying three trans- 
formations: transforming adjectivalizations to 
adjectives, nominalizations to verbs, and 
passives to actives (Coleman, 1964a, Experi- 
ment I). There has been no separate compari- 
son of the comprehensibility of adjectivaliza- 
tions and their grammatical transformations 
using adjectives, however, and the following 
experiment investigates a sample of adjec- 
tivalizations randomly selected from a library. 


Method 


Design. A Latin square was used to balance dif- 
ferences in Ss and sentences so that differences in 
transformations was a within-sentences as well as a 
within-Ss effect. That is, each sentence was written 
in both transformations (adjectivalization and sim- 
ple), and each S read half his sentences in adjec- 
tivalized and half in simplified form. 

Sampling variables. The Ss were 34 psychology 
students fulfilling a course requirement. 

The sentences also represented a sampling vari- 
able. Using the method of Experiment II, 40 adjec- 
tivalizations were drawn at random from the Sul 
Ross Library. Each was shortened to approximately 
12 words by dropping modifiers (range was 8 to 14 
words). Four sentences were dropped because they 
seemed to have a different meaning in the detrans- 
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formed version (e.g., Alginates are used in ice cream 
to give consistency. —> Alginates are used in ice cream 
to make it consistent.) The remaining 36 adjectivali- 
zations were detransformed to a simplified version 
using an adjective, and thus each sentence existed in 
two forms (adjectivalized and simplified) making 72 
sentences in all. Examples of the two versions are 
given in Table 2, and Lees (1960, especially Chapter 
3) may be consulted for a more extensive descrip- 
tion of adjectivalizations. Table 2 shows that de- 
transforming an adjectivalization frequently gives a 
more or less awkward sentence. A 2 X 2-inch slide 
was prepared for each sentence. 

Presentation. The sentences were presented in pairs 
by a slide projector, each sentence being flashed for 
5 seconds. Each pair of sentences was presented 
twice and the S was instructed: 


You will see each pair of sentences twice. Write 
down all you remember the first time—even dis- 
crete words—but write lightly. After you see the 
sentences the second time you will be allowed to 
correct what you have written. Be sure to write 
every word you remember. A prize of $5.00 will 
be awarded for the highest score. 


After presenting each pair, S was given 60 seconds 
to write down (and later to correct) what he had 
written. This time was apparently adequate because 
the S almost always stopped writing long before it 
was up. Immediately after seeing one pair twice, an- 
other pair was presented, the 36 sentences for a 
group of Ss being presented in an AA-SS-SS-AA 
order. 

Scoring. The written responses (i.e., the responses 
the S had corrected) were scored for number of cor- 
rect content words, counting pronouns as content 
words and counting any derived or inflected form 
of a word as correct. Only content words that ap- 
peared in both the adjectivalized and simplified ver- 
sions of a sentence were counted. For example, com- 
plex, milk, fat, comprehend, formula, and know were 
the words scored for the following pair: The com- 
plexity of milk fat may be comprehended when the 


TABLE 2 


EXAMPLES OF ADJECTIVALIZATIONS AND THEIR DETRANSFORMATIONS 











Adjectivalization 





The urgency of immediate demands is allowed to usurp 
attention. 

The complexity of milk fat may be comprehended when 
the formula is known. 

We eschewed any discussion of rightness or wrongness of 
a particular sexual habit. 

The distribution of power is constitutional in nature. 

A profile of standard scores indicates his capacity to 
produce, 

Tt will not be necessary to apologize in advance for sim- 
plicity. 


Detransformation; Adjective version 





Immediate demands are so urgent that they are allowed 
to usurp attention. 

We may comprehend how complex milk fat is when we 
know the formula. 

We eschewed any discussion of whether a particular 
sexual habit was right or wrong. 

The distribution of who is powerful is constitutional in 
nature. ‘ 

A profile of standard scores indicates how capable he 
is to produce. 

Tt will not be necessary to apologize in advance for 
being simple. : 
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formula is known. > When we know the formula 
we may comprehend how complex milk fat is. Re- 
sponses were also scored in correct pairs of succes- 
sive content words, correct triples, and correct en- 
tire sentences. For example, the S who wrote com- 
plexity of milk fat... formula known had five 
singles, three pairs (complexity of milk, milk fat, and 
formula known), and one triple. 


Results 


Mean correct words per S per sentence for 
adjectivalizations and simplified versions 
were: (a) 4.08-3.86 for number of correct 
content words, (b) 2.81-2.58 for pairs of 
words, (c) 1.83-1.70 for correct triples, and 
(d) .18-.19 for correct complete sentences 
(adjectivalizations always given first). None 
of these differences are statistically significant, 
and most are in the opposite direction that 
one would expect from the study of nominali- 
zations. Since the sample of sentences is fairly 
representative of adjectivalizations in general, 
it is reasonable to conclude that indiscrim- 
inantly detransforming any and all adjectivali- 
zations will have little effect in making prose 
easier to comprehend. The examples of Table 
2—some of which are quite awkward—sug- 
gest that many such detransformations have 
negligible or deleterious effects. A more fine- 
grained study however, might reveal advan- 
tages in detransforming certain categories of 
adjectivalizations, for example, those in which 
the detransformed version was shorter than 
the adjectivalization. 

There was considerable variability in the 
relative number of words in the adjectivalized 
and simplified versions of the sentences. In 
15 pairs of transformations, the simplified 
version was the longer, in 11 pairs the ad- 
jectivalized was the longer, and in 10 pairs 
they were of equal length. As a matter of fact, 
length was an excellent predictor of retention. 
For the 26 pairs of sentences that differed in 
length, a mean of 4.21 discrete words per S 
per sentence were retained for the shorter 
transformations versus 3.74 for the longer, 
more words being correctly retained for the 
shorter version in 20 pairs and the opposite 
being true for the remaining 6 pairs. 


EXPERIMENT IV 


Any writer who ever rearranged an em- 
bedded sentence to a less embedded version, 
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(e.g., Jack whom Mary loves hates Ann to 
Mary loves Jack who hates Ann) must have 
sensed intuitively that it is difficult for read- 
ers to understand the syntactic structure of 
embedded sentences. Recently it has become 
apparent that there are fundamental reasons 
for such intuitions about the reader’s diffi- 
culty. Workers in the fields of mechanical 
translation, for instance, have learned that 
attaching a syntactic description to embedded 
sentences is beyond the capacity of simple 
machines such as finite-state automata and 
requires considerably more powerful ma- 
chines. A series of articles by Chomsky 
(e.g., 1963) argue that the capacity for em- 
bedding in natural languages is the very char- 
acteristic that prevents their adequate descrip- 
tion by a finite-state grammar. 

Miller (1962) has briefly mentioned a 
psychological experiment investigating em- 
beddedness. He described qualitatively the 
intonation patterns of Ss asked to repeat 
highly embedded sentences such as, “The race 
that the car that the people whom the obvi- 
ously not very well-dressed man called sold 
won was held last summer.’ 

The following experiment reports a study 
of less highly embedded sentences such as 
might actually be found in English prose, and 
it uses cloze procedure to obtain a quantita- 
tive description of results. 


Method 


Design. As in the three above experiments, a Latin 
square was used to make the effect of most interest, 
the difference between transformations, a within- 
sentences and within-Ss effect. 


TABLE 3 


EXAMPLES OF EMBEDDED AND 
NONEMBEDDED SENTENCES 








Embedded Nonembedded 

The cat killed the rat that 
ate the malt. 

Bill is the man who can 
sell it. 

I gave the boy who lives | I gave the ball to the boy 
here the ball. who lives here. 

She gave the ring that I | She gave Jim the ring that 
found to Jim. I found. 


The rat that the cat killed 
ate the malt. 


The man who can sell it is 
Bill. 





Note,—Phrases have been omitted from the examples. 
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Sampling variables. The Ss were 40. Sul Ross 
undergraduates fulfilling a course requirement in 
introductory psychology. 

As shown by the examples of Table 3, 4 different 
kinds of embedded sentences were used, each kind 
being represented by 5 sentences whose mean length 
was 12.50 words. Thus there were 20 embedded 
sentences and their 20 nonembedded transforma- 
tions. Ten sentences were typed on a page in an 
ENNE order. It should be emphasized that these 
sentences were constructed by the E, not sampled 
from published literature. 

Tests. Five cloze tests were prepared for each of 
these pages by deleting every fifth word. One test 
deleted every fifth word beginning with the first 
word, a second deleted every fifth word beginning 
with the second, a third beginning with the third, 
and so on. Thus every word was deleted and a 
cloze score, the percentage of Ss who filled it in 
correctly, was obtained for each one. 

Presentation. It should be clear from the experi- 
mental design that each S read only two of the four 
pages of complete sentences, that is, he read a 
sentence in its embedded or nonembedded version— 
not both. Similarly he took only one of the five 
cloze tests prepared for this sentence. Within these 
limitations, the pages of complete sentences and the 
cloze tests were stapled into 40 booklets, each 
booklet containing two of the pages of complete 
sentences and each page being followed by one of its 
cloze tests. The first page of the booklet contained 
the following instructions: 


In the booklet there are tests that are passages 
with some of the words deleted. You are to try to 
fill in the words that were deleted. Always guess. 
A prize of $5.00 will be given the person who gets 
the most words correct. 


The Ss were tested in a group. After reading the 
instructions and filling in a practice cloze test, they 
were given 1 minute (timed by a stop watch) to 
read a page of complete sentences. They were then 
told to stop, turn the page, and begin filling in that 
page’s cloze test. They were given 7 minutes to fill 
in the test. They then read the other page of com- 
plete sentences and took its cloze test. 

Scoring. Responses were scored for number of 
words correctly inserted. Only the exact word was 
counted as correct except that any derived or in- 
flected form of a word was counted. Synonyms 
were not counted. 


Results 


There were exactly 250 words in the em- 
bedded versions and 250 in the nonembedded. 
Since each of these words was deleted in one 
of the cloze tests and since four Ss attempted 
to fill in each of these deletions, there were 
1,000 responses for embedded sentences and 
1,000 for nonembedded. A total of 561 words 
were inserted correctly for embedded versions 
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and 604 for nonembedded versions. This dif- 
ference is significant for the population of Ss 
beyond the .03 level by a one-tailed Wilcoxon 
matched-pairs test (7 was 199 for an N of 
35). It is also significant for the sentence 
population beyond .05 (T was 54 for an N of 
20). 

An experiment (Coleman & Blumenfeld, 
1963) that used cloze tests to compare nom- 
inalizations to their detransformations found 
that most of the difference was in content 
words, but this was not true in the present 
experiment. On the contrary, most of the 
difference was in the function words, a total 
of 336 function words being inserted cor- 
rectly in embedded sentences and 370 in non- 
embedded (p < .05 for the population of Ss 
by a one-tailed Wilcoxon matched-pairs test, 
T = 211). Total content words inserted cor- 
rectly were 225 in embedded sentences and 
234 in nonembedded, and this difference is not 
significant. 


DISCUSSION 


The most straightforward way to discuss 
the experiments is to say that they show that 
some grammatical transformations of a sen- 
tence are more easily comprehended than 
others; and thus, other things being equal, a 
writer would be wise to choose the more easily 
comprehended transformation. When exam- 
ined from this viewpoint, the implications of 
the data are obvious enough that little discus- 
sion is necessary; it is only necessary to em- 
phasize that other things are frequently not 
equal. 

A second, somewhat more speculative way 
to discuss the experiments involves rearrang- 
ing the matched pairs of sentences into other 
predictor variables. 

Length of clause. In 52 of the detransfor- 
mations, one long clause was changed into 
two short coordinate clauses, for example, A 
knowledge of the Mississippi would be help- 
ful. > If you knew the Mississippi, it would 
be helpful. In 33 of the pairs, the transforma- 
tion having two clauses was better retained, 
in 13 the opposite was true, and in 6 there 
was no difference (33 to 13 is significant be- 
yond .005). Apparently a person can process 
content morphemes packaged into two clauses 
more easily than he can process the identical 


LEARNING OF PROSE 


morphemes packaged into a single clause. 
Thus it seems that the advice to prefer short 
sentences might be better rephrased as a rule 
to prefer short clauses. If the clauses in a 
writer’s composition are short, he will prob- 
ably not improve readability much by empha- 
sizing the boundaries between them with pe- 
riods and capitals. 

Some contributions to an explanation of 
how sentences are understood might be made 
if length of clause were related to such pro- 
posed measures of syntactic complexity as 
dimensionality and diameter (Gammon, 1963) 
and depth (Yngve, 1961), and if they in turn 
were related to right-recursive, left-recursive, 
and self-embedding constructions. But since 
these relations will probably be neither simple 
nor direct, their discussion might better follow 
experiments specifically designed to study 
these predictors. 

Number of words. There were different 
numbers of words in 160 of the matched pairs 
of transformations, for example, They finally 
came to an agreement on the price. — They 
finally agreed on the price. In 91 of these 
pairs, the shorter transformation was better 
retained, in 65 the opposite was true, and in 
17 there was no difference (a ratio of 91 to 
65 is significant beyond .05). Thus as might 
be expected, it seems advisable for a writer to 
prefer the shorter of two transformations 
when other things are equal. 

Transformational complexity. Miller (1962) 
suggested that complex sentences are stored in 
nemory as kernels. The results of the active- 
assive comparison were examined for evi- 
lence for this notion, and the evidence sug- 
zested that this was probably not true—at 
east not in an absolute sense and under all 
-onditions. On the other hand, a tendency to 
‘ecode passives into actives was reported. 
Actually, an extensive study of transforma- 
ional complexity is premature at present be- 
ause the existing transformational gram- 
mnars—with their kernel and hierarchy of 
ransformations—were constructed by gram- 
narians who concerned themselves with log- 
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ical elegance rather than psychological real- 
ity. It is far from certain that the most ele- 
gant grammar will best reflect psychological 
processes. More investigations of how chil- 
dren learn their language (e.g., Braine, 1963) 
must be made before the kernel of sentences 
and the hierarchy of transformations can be 


described in psychological terms. 
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SOCIAL DESIRABILITY SCALE VALUES OF 
PERSONAL CONCEPTS * 


DANIEL B. CRUSE 


University of Miami 


The distribution of scale values of 1647 items scaled for social desirability was 
presented. The biomodality of the distribution of scale values for the items was 
noted and interpreted as showing that social desirability judgments of personal 
concepts are infrequently judged as neutral and tend to be either undesirable or 
desirable. Presentation of these items as a personality test showed the typical 
high correlation between social desirability scale values and frequency of en- 


dorsement. 


This study reports on the social desirability 
scale characteristics of a population of person- 
ality items based upon a well defined universe. 
Edwards’ (1953, 1957) research on person- 
ality assessment has shown that it is possible 
to predict the probability of a “true” response 
given a personality-type item and a scale 
value based upon social desirability judgments 
of the item. 

It is also of interest to investigate popula- 
tions of personality-type items or personal 
concepts scaled for social desirability. One 
such population would be the lists of traits 
by Allport and Odbert (1936). Another list, 
and the one used in this study, is the Uni- 
verse of Personal Concepts developed by 
Hilden (1954). Judgments of an item popu- 
lation provide social desirability scale values 
for the personality concepts involved, while 
an analysis of the distribution of scale values 
indicates the judges opinions about person- 
ality traits or personality concepts in general. 


MeEtTHOD 


The items used in the social desirability judgment 
task were constructed by Hilden (1958). These items 
were based on the Thorndike and Lorge (1944) list 
of 30,000 words. Words were randomly selected from 
the Thorndike-Barnhart Handy Pocket Dictionary 
(1951). If a word was at or below the sixth-grade 
level of difficulty and if a personal concept statement 
could be formed on the basis of definitions in the 
Thorndike Century Senior Dictionary (1941), the 
word was included in the sample (Hilden, 1954). 
Only words readily leading to a formulation of think- 
ing, feeling, behavior, etc., were used in the sample. 


1 This study was supported in part by Research 
Grants M-4507 and M-5988 from the National In- 
stitute of Mental Health, United States Public Health 
Service. 
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The criteria used in item construction assures a popu- 
lation of items with a specified base as well as items 
which should be fairly representative of statements 
referring to personality characteristics. 

Hilden’s Universe of Personal Concepts consists of 
1,575 items. In addition to these items, 72 items from 
a previous study were used (Cruse, 1963), making a 
total of 1,647 items. These 72 items were randomly 
mixed with the Hilden set. The items in Hilden’s 
Concept list are stated in the first person singular, for 
example, “I yield completely to my feelings.” The 
form of the items was changed to an indefinite one, 
“To yield completely to your feelings,” in order to 
minimize a personal reference and increase the like- 
lihood of general judgments as requested in the in- 
structions. The 1,647 items were judged by college 
students in undergraduate courses. The items were 
judged by 43 male and 52 female students. The sub- 
jects (Ss) were requested to judge the items on a 
nine-point social desirability scale. High scale values 
indicate statements judged as extremely socially de- 
sirable, scale values in the middle of the continuum 
indicate statements judged neutral, and statements 
with low scale values indicate statements judged as 
extremely socially undesirable. The instructions and 
format of the judging task was comparable to that 
given by Edwards (1957). The items were scaled by 
the method of equal appearing intervals.? 


RESULTS AND DISCUSSION 


A product-moment correlation of .97 was 
found between scale values of the 72 items 
previously scaled and the scale values de- 
rived for these same items in the new sample. 
The large size of this correlation indicates that 


2A 44-page list of the 1,647 items and median 
scale values of the judged items has been deposited 
with the American Documentation Institute. Order 
Document No. 8435 from ADJ Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress, Washington, D. C. 20540. Remit in advance 
$2.50 for microfilm or $6.25 for photocopies and 
make checks payable to: Chief, Photoduplication 
Service, Library of Congress. 
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SOCIAL DESIRABILITY SCALE VALUES 


1G. 1. The frequency distribution of 1,647 items scaled for social desirability by the method of equal ap- 
pearing intervals. 


he social desirability scale values in the 1,647 
et were not appreciably changed when the 
ask required judgments of a large number of 
ems. 

Figure 1 presents the frequency distribu- 
ion of the scale values of the 1,647 items. It 
} apparent that his population of items is not 
venly distributed over the social desirability 
ontinuum and is distinctly bimodal in shape. 

Hilden’s (1958) construction of the items 
‘as made independently of any consideration 
f social desirability factors. The construction 
f the item pool was specifically designed to 
rovide items representative of personal con- 
spts and personality-type items based upon 
ords at or below the sixth-grade level. The 
idependence in construction from social de- 
rability continua and the attempt for repre- 
¢ntativeness lends support to the notion that 
1¢ scale-value distributions are not peculiar 
) the method of construction but show atti- 
ides towards personal concepts. If the social 
esirability scale values may be taken as rep- 
sentative of attitudes towards personality 


concepts, one may conclude that social desir- 
ability judgments of personal concepts are in- 
frequently judged as neutral and tend to be 
either undesirable or desirable. 

The bimodality of judgments does not mean 
that no personal concepts are neutral, but that 
their relative frequency in a general popula- 
tion of personal concept items is small. One 
of the initial reasons for scaling such a large 
pool of items was to secure items in the neu- 
tral range. Figure 1 indicates that items in 
the neutral range are indeed relatively infre- 
quent and that securing any large number of 
neutral items may require specially worded 
items. The relatively small number of items 
in the neutral range may, in part, account for 
difficulties in constructing scales with a low 
correlation with Edwards’ 39-item social de- 
sirability scale (Edwards & Diers, 1963). 

The 1,647 items were administered to an 
independent group of Ss in the form of a per- 
sonality inventory. In the inventory the items 
were presented as questions, for example, “Do 
you yield completely to your feelings?” and 
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the S was requested to respond “Yes” or 
“No.” A product-moment correlation of .90 
was found between frequency of endorsement 
and social desirability scale values. The high 
correlation between social desirability scale 
values and frequency of endorsement supports 
previous work on social desirability (Edwards, 
1957) and shows that the social desirability 
variable accounts for a large proportion of the 
variance in personality-type items. 
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EFFECTS OF INTERMITTENT ILLUMINATION 
ON PERCEPTUAL-MOTOR PERFORMANCE! 


STEFAN SLAK 2 anp JOSEF BROZEK 


Lehigh University 


The experiment was designed to determine whether under conditions of inter- 
mittent illumination there is a significant impairment in performance as meas- 
ured by perceptual-motor tasks. Performance on 5 such tasks under 5 condi- 
tions of flickering light was compared with performance under steady light. 
Time and error scores were considered. No gross detrimental effects of inter- 


mittent illumination were detected. 


The effects of flicker have been studied 
1 several contexts. Fluorescent flicker 
oes not have detectable effect on visual 
fficiency (Zaccaria & Bitterman, 1952), 
,ow-frequency flicker seems to lower reading 
fficiency (Gerathewohl & Taylor, 1953). 
rolonged exposure to flicker does not affect 
dversely the performance in a simple arith- 
1etic test (Alexander & Chiles, 1959). John- 
on (1963) studied the effects of flicker on 
erformance in tapping, repeating digits, ad- 
ition, and serial subtraction, and did not un- 
over any impairment of performance. His 
asks did not require vision or perceptual- 
1otor coordination. Bach, Sperry, and Ray 
1957) studied performance under flicker in 
upping, locomotion, rotary pursuit, and rifle 
ring. In general, the results were not con- 
lusive but a marked impairment of perform- 
nce was observed under flicker in rotary pur- 
uit, a task that requires perceptual-motor co- 
rdination. In their experiment the average 
ycle illumination intensity of the flickering 
ght was substantially lower than illumination 
itensity of the steady light. 

The present study was designed to measure 
erformance under conditions of intermittent 
lumination in a variety of perceptual-motor 
sks, with a better control of illumination 
itensity and a larger range of flicker fre- 
uencies than used in previous studies. 


Metruop 


Conditions and apparatus. The experiment took 
lace in a dark room. The light beam from a slide 
rojector, equipped with a 300-watt lamp, passed 


1The study was supported by the National Insti- 
ites of Health Grant MH 07179. 
2 Now at Wayne State University. 


through a four-sector rotating disk attached to the 
shaft of a variable-speed electric motor, This setup 
yielded flickering light with sine waves and pulse-to- 
cycle fraction of 4. The light was reflected from an 
aluminum foil surface (22 X33 inches) onto the 
working desk (24 X 36 inches). A neutral density 
filter attached to the projector reduced the intensity 
of light to 50% for the control condition. The peak 
light-flash illumination intensity of the desk was 4 
footcandles. The average cycle illumination intensity 
was 2 footcandles for all flicker frequencies. The 
illumination intensity of the steady light was also 
2 footcandles. The reflection factor of the desk was 
approximately 0.1. The desk was illuminated only by 
the light passing through the rotating disk. A black 
rectangle cardboard suspended at the level of- the 
subject’s (S’s) eyes shielded the Ss from the glaring 
surface of the aluminum foil. Five flicker frequencies 
were used: 1, 3, 9, 24, and 40 cycles per second. 
Tasks. All five tasks required vision and some de- 
gree of perceptual-motor coordination: inserting keys 
in a grooved pegboard, mirror tracing, and card 
sorting. The cards were sorted using numbers (from 
1 to 15), figures (15 combinations of geometrical 
symbols), and colors (10 different hues with identi- 
cal brightness and saturation) as criteria. The total 
number of cards for sorting numbers was 135 (9 X 
15), for figures 150 (10 X 15), and for colors 100 
(10 X 10). Time and error scores were recorded in 
all tasks except for the grooved pegboard where only 
time score could be considered. Error score in mirror 
tracing was the length in centimeter of the S’s trace 
when distant more than 1 millimeter from the line. 
In card sorting the error was the number of incor- 
rectly sorted cards. Time was expressed in seconds. 
Subjects. 44 normal undergraduate students served 
as Ss. A screening procedure consisting of an inter- 
view and EKG served to eliminate latent epileptics 
in whom seizures may be induced by certain fre- 
quencies of intermittent light. No Ss were eliminated. 
Procedure. The Ss were divided into six groups: 
the control group with 14 Ss and five experimental 
groups with 6 Ss each. In a pretest, all Ss performed 
all five tasks under steady light. Two weeks later, 
control Ss were again tested under steady light. The 
experimental Ss performed the tasks under one of 
the 5 flicker frequencies. Pretest scores were sub- 
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TABLE 1 


MEAN DIFFERENCE SCORES FOR ALL TASKS AND ALL CONDITIONS WITH MEAN SQUARES 
(‘“WITHIN-GROUP’’) AND SMALLEST SIGNIFICANT DIFFERENCES 





Steady 

Time light 1 cps 3 cps 
Inserting keys — 2.64 1217 0.00 
Mirror tracing — 88.07 —42.00 —101.67 
Sorting numbers —19.64 —15.33 8.50 
Sorting figures — 80.50 —74.83 — 65.00 
Sorting colors —18.07 13.00 —12.00 

Error 
Mirror tracing —5.57 4.17* 4.00 
Sorting numbers —0.14 —0.17 —0.67 
Sorting figures —1.64 —1.00 —0.33 
Sorting colors —3.00 —2.50 —2.67 


* p <0.05. 


tracted from the test scores in order to reduce the 
factor of individual variability. These difference 
scores (test minus pretest) were used in the statisti- 
cal analysis which was carried out separately for 
each task and for time and error scores. 


RESULTS 


Means of difference scores for different 
tasks are presented in Table 1. Dunnett’s test 
for comparisons with a control was used 
(Dunnett, 1955). 

Should performance under flicker be im- 
paired, the mean scores in Table 1, obtained 
under conditions of intermittent light would 
be higher than those obtained under steady 
illumination. 

The smallest significant differences (at 0.05 
level for two-tailed test) between perform- 
ance under flickering and steady illumination 
are indicated in the last column of the table. 

Inspection of the table tells us that, taken 
singly, two differences are significant at 0.05 
level. The error score in mirror tracing was 
higher under flicker of 1 and 24 cycles per 
second than it was under steady illumination. 
However, two significant differences out of 45 
are just about what could be expected by 
chance alone if significance level of 0.05 is 
used. 


DIscUSSION AND CONCLUSIONS 


The results do not provide evidence of sig- 
nificant deterioration in working efficiency 


Smallest 

significant 

9 cps 24 cps 40 cps MSwe difference 
2.00 —4.00 =2.17 26.02 6.67 
—10.67 —68.83 — 21.83 5,895.53 100.41 
11.50 —18.50 = 21.50 881.59 35.30 
—53.50 —60.50 —107.83 2,405.06 64.13 
4.50 —5.83 —5.17 1,616.48 52.58 
Onli 4.83* 3.50 53.98 9.62 
0.33 —0.17 —0.33 0.75 1.13 
—0.50/ —0.83 —4.00 8.37 3.78 
2.00 1.00 —3.17 19.04 5.63 


under the given conditions of intermittent 
illumination. In the Tulane studies (Bach, 
Sperry, & Ray, 1957) a marked impairment 
of performance under flicker of 9 cycles per 
second was obtained in rotary pursuit. It 
should be noted that in their experiment the 
ambient steady illumination was 20 foot- 
candles (control) whereas the average cycle 
illumination intensity of the flickering light 
was 0.1 footcandle. It is known that illumina- 
tion intensity does have an effect on perform- 
ance and the difference in illumination inten- 
sity probably accounts for the radical drop in 
performance observed under flicker. 

The main purpose of this study was to as- 
sure reasonable generality of conclusions and 
applicability to practical situations. The gen- 
erality, of course, is limited by the experi- 
mental conditions (laboratory simulation of 
working environment, low-level illumination, 
and regular periodicity of light flashes) but 
is increased by the use of variety of percep- 
tual-motor tasks. 

The conclusion was reached that intermit- 
tent illumination, such as used in the present 
experiment, does not cause a gross reduction 
in perceptual-motor performance. 

Yet, flicker is often characterized as irritat- 
ing, annoying, or disturbing (Bach, Sperry, & 
Ray, 1957; Johnson, 1963). Why does it not 
affect working efficiency? One reason may be 
stated in the form of a hypothesis which may 
serve as a basis for further investigation: 


ILLUMINATION AND PERCEPTUAL-MoToR PERFORMANCE 


‘licker-induced disturbances are more likely 
o occur in the state of passivity. Voluntary 
erceptual-motor activity counteracts such 
isturbances so that performance remains un- 
ffected by the otherwise annoying character 
f intermittent light. 
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DOUBLE-SPLIT CROSS-VALIDATION: 


AN EXTENSION OF MOSIER’S DESIGN, TWO UNDESIRABLE 
ALTERNATIVES, AND SOME ENIGMATIC RESULTS?* 


WARREN T. NORMAN 


University of Michigan 


3 possible extensions of Mosier’s double cross-validation (DCV) design are 
considered for the multiple-criterion case. 1 extension, double-split cross-valida- 
tion (DSCV), is defended as preferable on grounds of unbiasedness, relative 
efficiency of estimation, and procedural simplicity. An empirical test of the 
rationale, employing self-report inventory predictors and peer-rating criterion 
measures for 5 personality variables, is reported. Results support the argu- 
ments in favor of the DSCV design but highlight the need for the development 
of a multiple-criterion extension of the theory and techniques of multiple 
linear regression analysis for use in studies employing this design. 


In his 1951 paper on “Problems and De- 
signs of Cross-Validation,” Mosier presented 
an extension of the classic cross-validation 
procedure which had been discussed earlier 
by Kurtz (1948) and by Cureton (1950). 
Mosier’s extension, the so-called double cross- 
validation (DCV) design, yields unbiased 
estimates of validity for each of two sepa- 
rately developed, empirically fitted functions, 
whether these be regression equations, dis- 
criminant functions, empirical scoring keys, or 
some other combination of a set of observables 
weighted so as to maximize the fit to a given 
set of external criterion data. The design 
requires an initial partition of the available 
cases, on each of whom both test and criterion 
data are available, into a pair of samples, 
usually of equal size. It differs from the or- 
dinary cross-validation design in that predictor 
functions are developed separately on each of 
these samples instead of on just one. Each 
function is then independently validated on 
the other sample, that is to say, on those cases 
not used in its development. 

There are two problems inherent in Mosier’s 


1 The analyses reported here were carried out under 
a Faculty Research Grant from the Rackham School 
of Graduate Studies of the University of Michigan 
with computer time provided by the University 
Computing Center. The paper was prepared under 
a Faculty Research Fellowship from the Rackham 
School during the summer of 1963. Data reported 
were collected under a project sponsored by the 
Personnel Laboratory, Aeronautical Systems Divi- 
sion, Air Force Systems Command, Lackland Air 
Force Base, Texas. 


design which he recognized but felt unable to 
resolve to his own satisfaction. First, the DCV 
design treats the two parts of the problem, 
function development and _ validation, as 
equally “data-demanding.” That is to say, 
because of the symmetry of the design, as 
many cases are ordinarily available to com- 
pute the unbiased estimates of validity for 
each of the functions as are employed initially 
to fit the regression lines, or to locate the 
discriminant hyperplanes, or to select items, or 
whatever. However, if there are very many 
predictor variables involved or if the indi- 
vidual predictors are quite unreliable or both 
(as is typically the case, for example, in em- 
pirical key construction applications), then 
relatively more of the data is required and 
should be used in the first step of the process 
than in the second to achieve a uniform over- 
all level of precision.” 

The second difficulty is in a sense an em- 
barrassment of riches. The DCV design gen- 
erates two functions and two validities instead 


2 The author takes such a uniform overall level of 
precision to be optimal assuming both aspects of the 
problem are of equal importance. There may be 
cases where the researcher’s utility for precision of 
the key or function developed exceeds that for the 
estimate of its validity. But it seems entirely un- 
likely one would ever want to have it the other way 
around—“you need no razor blade to cut warm 
butter” as Herbert Feig] has aptly remarked. How- 
ever, this latter case is just what occurs with the 
DCV design because of the relatively larger standard 
errors of regression, discriminant, and _ especially 
itemetric estimators. 
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of just one of each. To the extent that there 
are differences between the functions and/or 
between the validity estimates, either a choice 
must be made between them or some means 
of combining the data must be employed. But 
there is often no basis for choice in the first 
instance and the latter alternative leaves one 
with no independent data available to validate 
the resultant function formed. 

In what follows there will be no solution 
proposed for this second problem. It remains, 
if anything, in a more aggravated form than 
Mosier left it (but not so aggravated as it 
could be under one possible extension of Mo- 
sier’s design that will be presented). On the 
other hand, there will be an attempt to deal 
with the first problem in at least a gross 
manner. However, the primary intent of this 
paper is to present an extension of Mosier’s 
design which I choose to call double-split 
cross-validation (DSCV) and which appears 
to me to be useful for attacking certain 
multivariate generalizations of the problem 
Mosier was concerned with. In the course of 
this presentation two possible alternatives to 
the DSCV design will also be described, but 
they will be criticized for either logical or 
practical defects they possess. Finally an em- 
pirical comparison of two of the designs will 
be presented which, while supportive of por- 
tions of the argument, leads to one trouble- 
some and seemingly incongruous result. 


THE PROBLEM AND SOME ALTERNATIVE 
SOLUTIONS 


Suppose one starts out initially to build 
empirical scoring keys for an inventory to 
measure each of a set of several criterion 
variables. In particular, assume a total of NV 
subjects (Ss) is available for each of whom 
one has a completed itemized, self-report in- 
ventory protocol and an assessment of status 
on each of K criteria. If one were to use the 
DCV design, he would first split his cases into 
two samples of, say, Na and Ny cases. He 
would then select item-response categories for 
each criterion key using the data (item validi- 
ties) from each sample separately. He would 
thereby construct 2K scoring keys for each 
of which he could obtain an unbiased esti- 
mate of validity by using the test perform- 
ances and criterion data of Ss in the other 
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group. So far, this procedure can be thought 
of simply as K separate DCV analyses with 
the same samples of V4 and Nx persons used 
in each, albeit with different and experiment- 
ally independent measures on each S for each 
of the criteria. 

There are a number of problems in addi- 
tion to the two mentioned above that can 
arise, or at least become more readily appar- 
ent, in this sort of multiple-criterion applica- 
tion of the DCV design. First, suppose that in 
selecting response categories for the several 
keys, one has used those for certain of the 
items on two or more of the K keys whenever 
their validities against the several separate 
criteria warranted it. This is altogether com- 
mon contemporary practice in developing 
empirically keyed inventories (e.g., the SVIB, 
MMPI, and CPI). And, by permitting more 
keyed responses per scale, this practice prob- 
ably leads to higher reliabilities and validities 
for each scale considered separately than 
would otherwise be possible. However, if this 
is done, the interscale correlations will reflect 
not only the basic covariation between pairs 
of criteria but also the completely spurious 
sources attributable to the perfectly correlated 
error components for the various subsets of 
jointly keyed items. For typical inventory 
items, the proportion of error variance is apt 
to be large. Accordingly, the amount of dis- 
tortion introduced by this artifact is likely to 
be great whenever there is much item over- 
lap between a pair of keys with a predomi- 
nant direction to the keying of the common 
items. 

A related effect occurs on inventories requir- 
ing forced choices to items, the response cate- 
gories of which are exhaustively keyed for 
the several attributes (as, for example, on the 
EPPS or the AVL Study of Values). In this 
case a negative bias to the interscale correla- 
tions (amounting to — 1/(K — 1) on the av- 
erage for fully counterbalanced and exhaus- 
tively keyed inventories) results solely as a 
function of the ipsatizing format and scoring 
procedures used. 

The presence of these and other potential 
“methods factors” (e.g., desirability or acqui- 
escence biases, instructional and administra- 
tive atmosphere effects, other varieties of 
stimulus presentation and format factors, 
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TABLE 1 


EXAMPLE OF THE EFFECT OF MULTIPLE KEVING OF 
INVENTORY ITEMS ON INTERSCALE CORRELATIONS 
AND RELIABILITIES FOR ONE OF THE BINARY, 
ForcreD-CHOICE INVENTORIES 








Selected pairs of variables 


2 3 1 3 1 5 





Scales with multiple keying 
Number of keyed items A7 54 56 54 56 40 


KR 20 reliabilities 77 86 .87  .86 87 .64 
Number keyed ‘‘same”’ 22 0 10 
Number keyed “‘opposite”’ 1 35 10 
Interscale 7 .82 —.93 —.00 


Scales with single keying 
Number of keyed items Ze ZO mae S 26 _ 22 
KR 20 reliabilities SOO" Sn/2 nie eho 74 46 
Interscale 7 .56 —.67 -02 


Criterion measures 


Intercriterion 7 AD —.38 —.06 





Note.—The Descriptive Adjective Inventory (DAI). All 
scales constructed using Sample A data (Na = 228). All 
Neco and reliabilities reported based on Sample B data 

B = 228). 


etc.), when operative in the inventory data in 
a manner or degree different from that in 
which they influence the criterion assessments, 
are troublesome on two counts. To the extent 
that such artifacts are unrelated to com- 
ponents in the various criterion variables, 
their presence in the predictors has a sup- 
pressing effect on convergent validity. And to 
the degree one wishes to draw inferences con- 
cerning the relationships that are obtained 
among the several variables in subsequent 
applications where only inventory data are 
available, he is apt to be grievously misled by 
such spurious variance components, 

For an example, consider the data presented 
in Table 1 which are taken from a previous 
study by the author (Norman, 1963a, 1963c). 
The three pairs of variables presented were 
chosen from the 10 possible pairs among the 
five variables in the original study because 
they are the ones for which the corresponding 
scales on this inventory, which permit multiple 
keying, have the greatest amount (and pro- 
portion) of joint keying (Pair 2, 3), of oppo- 
site keying (Pair 1, 3) and the most even 
balance of joint and opposite keying (Pair 1, 
5). Also presented are the corresponding 
intercriterion correlations and interscale cor- 


WARREN T. NORMAN 


relations for a second set of shorter keys for 
which no multiple keying was permitted. The 
nature and potency of the multiple-keying 
effect is obvious from the data in this table. 
A direct solution to this particular problem 
might be attempted by using any given item 
on no more than one key, as has been done 
for the shorter scales in Table 1. But this 
might prove to be an inefficient solution, espe- 
cially for relatively short inventories, if indi- 
vidual items have valid variance components 
for several criterion dimensions, as the prac- 
tice of multiple keying implicitly assumes. 
Compare, for instance, the reliabilities for 
corresponding scales of the two kinds pre- 
sented in Table 1. And of more general im- 
portance is the recognition that this particular 
approach, even when the reduction in relia- 
bilities and the incomplete extraction of avail- 
able true score variance from the item re- 
sponses are considered unimportant, is still not 
sufficient since it will have no necessary re- 
medial effect on other varieties of contami- 
nants that might also be present. Note, for 
example, that the correlations among the 
shorter keys in Table 1 are still not uniformly 
matched to the corresponding intercriterion 
correlations. While one might attempt to de- 
vise and utilize one corrective mechanism after 
another in ad hoc fashion to handle each pos- — 
sible artifact that might be present, it would 
be more efficient if one could, by a single 
analysis procedure, suppress whatever spur- 
ious method effects might be operating. 
Suppose, for example, that we build our 
scales permitting multiple keying of given 
items. But then, in addition, suppose we 
construct regression functions for the several 
criteria using these scales as the predictor 
variables, In so doing the spurious components 
of our interscale correlations might be ex- 
pected to act in the manner of suppressor 
variables. If so we should find that the corre- 
lations among the scores predicted by the 
regression functions map those among the 
criteria considerably better than do the cor- 
relations among the original, multiply keyed 
scales, even though there might not result 
much, if any, improvement in the validities 
(a suppressor, after all, has to be a very 
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effective one in order to raise a moderate 
validity by very much) .® 

However, it is at this point that a limita- 
tion, for our purposes, of Mosier’s DCV design 
becomes apparent. If we build our keys on 
one of the two samples and develop our 
regression functions on the other, there re- 
mains no data from which to obtain unbiased 
estimates of the validities of the regression 
functions. If instead, we develop the func- 
tions on the same sample used to construct 
our scales, the correlation with the criterion 
variable of one of the scales in each regression 
analysis (that scale originally built to predict 
the given criterion) will be spuriously large. 
This latter procedure might be called redun- 
dant double cross-validation (or RDCV) and 
is illustrated in Figure la. This design, by 
virtue of the biased estimators it employs, 
generates not only inflated multiple Rs but 
nonoptimal beta weights as well. And this 
latter, in turn, results in lower cross-validated 
estimates of validity for the regression func- 
tions than otherwise would be attainable. 

Clearly if we are to pursue this approach, 
we need some sort of triple partition of the 
original cases to make the procedure unbiased 
at each stage. The most immediate possibility 
that comes to mind would entail an initial 
separation of the available cases into three 
equal sized samples; call them A, B, and C. 
One could then build a set of keys with the 
data from each sample; develop two sets of 
regression functions for each set of keys, one 
set on each of the remaining independent sam- 
ples; and finally validate each of the six sets 
of functions so developed on the one inde- 
pendent sample that is left. This procedure, 
which might be dubbed ¢viple cross-validation 
(TCV) is illustrated in Figure 1b. 

There are two basic difficulties with this 
design, however. First, like its prototype, the 
DCV design, the symmetry of this extended 

3 What is more, this procedure is not subject to 
the objection that could validly be raised to the use 
of regression methods to derive differential item 
weights for the original keys. The scales ordinarily 
will be considerably fewer in number and individually 
far more reliable than the individual items. Hence, 
Richardson’s proof (1941), that as the number of 
positively correlated predictors gets large and/or as 
their reliabilities become small the variance of their 


beta weights tends to zero, is here not particularly 
germane. 
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version provides equal amounts of data for 
all three phases of the procedure—key con- 
struction, function fitting, and validation— 
whereas each successive task probably requires 
less data than its predecessor in order to 
attain uniform overall precision. And second, 
the sheer amount of computational work is 
excessive. While some increase in effort over 
that required by the simpler designs is surely 
unavoidable, to propose the construction of 
three sets of keys, six sets of regression func- 
tions and a like number of validities is being 
at least a bit expansive! So let us consider a 
third alternative. 

Suppose one initially partitions the avail- 
able cases into only two samples, A and B, 
and that he constructs one set of keys from 
the data in each as he would using Mosier’s 
design. One thereby bases each set of scales 
on half, rather than a third, of the available 
cases. Suppose next that he splits each of these 
two samples into a pair of subsamples, each 
containing one-fourth of the total number of 
cases, and develops a set of regression func- 
tions on each of these quarter-samples using 
the keys constructed on the nonparent sample. 
Each of the resulting, unbiased regression 
functions can then be’ validated on the 
quarter-sample of data thus far unused in any 


prior phase of the process, thus providing — 


unbiased estimates of validity as well. Figure 
1c illustrates this DSCV design in a schematic 
fashion. 

This procedure has several virtues when 
compared to the two alternative designs pre- 
sented above. First, it is unbiased at all three 
stages whereas the RDCV design is not. Sec- 
ond, it involves both less computing and 
fewer distinct sets of keys, functions, and 
validities to deal with at the end than does the 
TCV design (though twice as many functions 


and estimates of validity as does RDCV). — 


And third, it provides what is probably a 


closer approximation to the optimal allocation — 


of data to the three phases of the problem 
than either of the two alternative designs by 
employing twice as many cases for key devel- 
opment as it does for function fitting or for 
validation.* 


4 Actually, a precisely optimal solution to the 
allocation problem will seldom, if ever, be possible 
for any design which uses each case exhaustively, 


—- -—— 
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TABLE 2 
CORRELATIONS AMONG THE CRITERION RATING MEASURES, AMONG THE EMPIRICAL SUMMATION 
INVENTORY SCALES, AND VALIDITIES, RELIABILITIES AND MULTIPLE KEyING 
CHARACTERISTICS OF THE INVENTORY SCALES 
Criterion rating measures Summation inventory scales 
Variable 1 Dy 3 4 5 1 yi 3 4 5 
1 
2 —.03 — 71 
3 — 38 A9 —.92 BH 
4 Pal AL .03 EDS —.17 —.58 
5 — .06 B33 .63 etd 08 .06 .25 —.23 
Validities of summation inventory scales 43 40 49 33 37 
KR 20 reliabilities of summation inventory scales 93 84 92 18 76 
Number of items keyed on each of the summation inven- 
tory scales 165 157 158 155 146 
Pair of variables Te ibys) eee ee 2 Oe AS SS, 3,4 350) 405 
Number of items keyed same 9 0 45 24 62 41 27 13 45 23 
Number of items keyed opposite 34 87 6 26 5 3 5 32 3 12 
Multiple keying index —16 —54 24 —1 36 24 16 —12 28 7 





Note.—Summation Scales developed on Sample A data (N = 228). All correlations, validities, and reliabilities based on 


Sample B data (VN = 228). 


8 Multiple keying index = 100[2(Is — Io) + (Ii + Ij)] where Ii and Jj are the numbers of items keyed on scales i and j and 
T; and I, are numbers of multiply keyed items in the same and in the opposite directions for the given pair of scales, i and j. 


AN EMPIRICAL EXAMPLE 


The data on which the following analyses 
are based were obtained as part of a large 
scale test development project, other aspects 
of which have been reported earlier by the 
author (Norman 1963a, 1963b, 1963c). The 
Ss consist of 456 fraternity and residence hall 
men from the University of Michigan on each 
of whom peer nomination rating scores on 
each of five criterion variables were obtained. 
In addition, each S completed three binary, 
forced-choice, self-report inventories each of 
which was designed to permit the construc- 
tion of empirical keys to measure the same 
five variables assessed by the criterion rat- 
ings. After the Ss had been divided into a 
pair of cross-validation samples, each con- 
taining 228 men and matched as closely as 
possible on type of residence, academic class 
standing and factor structure of the criterion 


that is, for all three tasks. Such optimal allocations 
will likely be possible only for extensions of the 
simple cross-validation procedure where some frac- 
tion of the cases are used to build a single set of 
keys, another to develop the regression functions, 
and the remainder to estimate validities. 


ratings, empirical keys for the five dimensions 
were constructed separately on each sample 
for each inventory. Multiple keying (both 
joint and opposite) of individual items was 
permitted if the item validities for the several 
criteria warranted it. Complete cross-valida- 
tion, multitrait-multimethod matrices (five 
variables by criterion ratings, criterion ratings 
with profile elevation components removed, 
and three inventories) are available through 
ADI as Tables A and B.° 

In Table 2 are presented the correlations 
(based on the B sample data) among the 
criterion rating scores and among a set of 
summation inventory scales developed on the 
A sample.® Scores on these summation scales 


5 Tables A and B have been deposited with the 
American Documentation Institute. Order Document 
No. 8436 from ADI Auxiliary Publications Project, 
Photoduplication Service, Library of Congress, Wash- 
ington, D. C. 20540. Remit in advance $1.25 for 
microfilm or $1.25 for photocopies and make checks 
payable to: Chief, Photoduplication Service, Library 
of Congress. 

8 These are, of course, only half of the results 
obtained from the full DCV analysis. In this and 
other tables to follow only the analyses based on the 
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TABLE 3 


CORRELATIONS AMONG SCORES PREDICTED BY THE 
REDUNDANT DoUBLE Cross-VALIDATION (RDCV) 
REGRESSION FUNCTIONS OF THE EMPIRICAL 
SUMMATION Krys, TOGETHER WITH 
MULTIPLE Rs AND VALIDITIES 








Regression functions (Fa.a) 











Regression functions (Fa4.a) 1 2 3 4 5 

ie 

2 —.16 

3 —.62 .53 

4 OOM Os eels 

5 —.20 .33 .69 .13 
Multiple correlations (Ra.a.a) 2608. 67, 7.68 -.61 9.69 
Validities (Va-.a-n) 46 9231 440 227 39 

Note.—Keys, regression functions, and multiple Rs all 


derived from Sample A data (N = 228). 
validities based on Sample B data (VN = 22 


Correlations and 
8). 
were obtained by simply adding the un- 
weighted scores from the empirical keys for 
corresponding variables across the three in- 
ventories. In this table cross-validation esti- 
mates of the validities and KR 20 reliabilities 
for each summation scale, and information on 
the amount and direction of multiple keying 
for each pair of scales are also presented. 
The first thing to note about the data in 
Table 2 is that the average correlation among 
the criterion measures is positive, .28, whereas 
that among the inventory scales is negative, 
— .19.7 Since the criterion measures are in no 
sense ipsatized, either by the data collection 
methods used or by subsequent standardizing 
procedures, whereas the semicounterbalanced, 
binary forced-choice format of the inven- 
tories does, to varying degrees for different 
pairs of scales, introduce a form of ipsatiza- 
tion, this aspect of the differences in the pat- 
terns of correlations is not surprising. More 
pertinent and equally apparent, however, are 
the effects of multiple keying. For Scale Pairs 
(1, 3), (1, 2) and (3, 4), where the effect of 
multiple keying augments the bias toward 
negativity owing to ipsatization, the effect is 
most obvious. And for Scale Pairs (2, 3) and 
(1, 4) the multiple-keying effect is apparently 
sufficiently great to completely override that 





Sample A keys will be presented since they are 
sufficient to make the points at issue. 

7 All sums, differences, and averages of correlations 
reported are based on z-transformations. 
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due to ipsatization. The apparent exceptions, 
Scale Pairs (3, 5), (2, 4) and perhaps (2, 5) 
and (4, 5) as well, where the multiple-keying 
index would suggest a shift toward higher- 
positive correlations than were obtained for 
the criteria, must be accounted for in other 
ways.® But whatever the full set of causal 
factors may be, it is abundantly clear that 
the two patterns of correlations are different 
and that multiple keying of items on the 
various pairs of scales and the ipsatizing for- 
mat and scoring procedure are partly to blame. 

As a first attempt to achieve a closer fit to 
the pattern of criterion correlations, a re- 
dundant double cross-validation analysis was 
performed. In Table 3 are reported the cor- 
relations based on the Sample B data among 
the multiple-regression functions derived from 
the Sample A data using as predictor variables 
the five summation scales which had also 
been derived using the Sample A data. 

If one compares these interfunction corre- 
lations with those presented in Table 2 among 
the criterion variables and among the simple 
summation scales, he will be struck by the 
appreciably greater similarity of these inter- 
function correlations to those among the 
criteria than exists for the summation keys 
relative to the same criteria. Indeed only for 
the Pair (1, 3) is the magnitude of the alge- 
braic (z— transformed) difference between 
the interfunction and intercriterion 7 greater 
than .16, whereas for all but one pair of sum- 
mation scales, that is, (1, 5), the disparities 
are greater than this, and in several instances 
are really very large. But the story told by 
the last two lines of Table 3 is, as expected, 
not so happy. The validities for these regres- 
sion functions are certainly no greater, and in 
fact appear to be a bit smaller on the average, 
than are those for the simple summation 


8 For example, one might examine the data for 
compensating patterns of interitem covariances. Or he 
might look for the possible occurrence of unusually 
large numbers of items, the keyed stems of which 
reflect the particular pairs of variables involved 
thereby increasing the negative bias of ipsatization 
on these particular pairs of scales. No analysis based 
on either of these conjectures has as yet been at- 
tempted however, and they are offered only as 
possible approaches to a full and complete explana- 
tion of the differences between these sets of corre- 
lations—the reader may well think of others.. 
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keys. In addition, the amounts by which the 
multiple Rs differ from their corresponding 
validities are well in excess of what one would 
expect for samples of this size if only ordi- 
nary regression shrinkage were operating. 
The most reasonable explanation is that 
proposed earlier; that is, that by use of the 
same criterion data (from Sample A) both to 
construct the empirical summation keys and 
to derive the regression functions, one has 
overly weighted in each regression analysis 
that particular inventory key originally built 
to measure the corresponding criterion vari- 
able (by virtue of the inflated zero-order 
correlation of this scale against this criterion 
measure in this sample). This argument also 
implies that the multiple Rs obtained from 
this RDCV analysis are spuriously high since 
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they too are derived from Sample A data and 
that the severe shrinkage observed is in part 
due to this and in part due to underestimating 
the potential validities by virtue of computing 
them for nonoptimal functions. These obvi- 
ously inflated multiple Rs and somewhat at- 
tenuated validities thus give empirical sup- 
port to what were seen earlier to be the 
principal faults of the RDCV design. The 
highly congruent pattern of interfunction re- 
lationships obtained vis-a-vis the criteria is 
however, an unexpected but apparent virtue 
of this design. 

The more undesirable consequences of the 
RDCV analysis should be avoidable, how- 
ever, if one were to use instead a double-split 
cross-validation design. To this end, the data 
were once again reanalyzed. To accomplish 


TABLE 4 


CORRELATIONS AMONG THE CRITERION RATING MEASURES AND AMONG SCORES PREDICTED BY THE 
DovusLe-SpLit Cross-VALIDATION (DSCV) REGRESSION FUNCTIONS OF THE EMPIRICAL 
SUMMATION Krys TOGETHER WITH MULTIPLE Rs AND VALIDITIES 








Criterion rating measures 


Subsample b data*® 





Subsample £ regression functions 








Variable 1 2 3 4 1 2 3 4 5 

1 

2 .06 — 81 

3 — 32 93 —.71 .60 

4 33 43 mat — .26 18 —.50 

5 .02 33 62 mil — 48 a2 18 —.52 
Multiple correlations (Ra.g.g) 49 46 rod, AT 50 
Validities (V4.g.v) 44 noi 47 oi) AO 





Subsample 6 data» 





Criterion rating measures 


Subsample b regression functions 





Variable 1 2 3 4 1 2 3 4 5 

il 

2 —.13 — .64 

3 — 45 46 — .90 88 

4 .08 A0 — .06 pou 05 — 33 

5 —.15 BOL, 64 10 —.55 25 53 — .62 
Multiple correlations (Ra.,.4) 48 38 Poi OA 45 
Validities (Va-p.s) 46 45 48 40 A3 





Note.—Keys constructed on Sample A data (N = 228). 


8 Regression functions and multiple correlations computed on Subsample b data (N = 114). Correlations and validities based 


on Subsample B data (N = 114). 


b Regression functions gna ‘joe correlations computed on Subsample 6 data (N = 114). Correlations and validities based 


on Subsample b data (N = 
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this, each Sample, A and B, was further split 
into a pair of subsamples of 114 cases each; 
these were denoted by the letters ‘“‘a,” “a,” 
“b,” and “8,” respectively. Once again an 
effort was made to keep the factor structures 
(based on the peer-nomination rating scales 
from which the criterion measures are derived) 
and the demographic characteristics (class 
standing and type of residence) as similar as 
possible. However, it was not possible to 
match these smaller subsamples as closely on 
these characteristics as it had been for the 
original partitioning of the cases into the sam- 
ples of 228 cases each. A partial and some- 
what indirect indication of the degree to 
which a match was achieved may be had by 
comparing the patterns of correlations among 
the criterion measures based on the b and 8 
subsamples which are presented on the left 
side of Table 4. The most severely mismatched 
relation is apparently that for the pair of 
criterion measures (1, 4); that is, .33 in sub- 
sample b versus .08 in subsample 8. The over- 
all picture obtained from these data, how- 
ever, is that a fairly good match was achieved 
—far better, in fact, than this writer expected 
after seeing the factor structures based on 
the component scales that are the constituents 
of these criterion measures. 

A further examination of Table 4 reveals 
one expected and more-or-less encouraging set 
of findings, and another that is startling and, 
to say the least, highly disconcerting! The 
first and more reassuring result is that the 
validities based on this fully independent 
series of analyses are on the average a bit 
higher than those obtained from the RDCV 
analysis (Table 3) and perhaps even a shade 
better than those for the simple summation 
scales (Table 2). In addition, note that the 
multiple Rs now are much less highly inflated 
than were those obtained in the RDCV anal- 
yses and, as a partial consequence, show very 
little shrinkage when these regression func- 
tions are cross-validated.® 

The second, and highly distressing, finding 


® The apparently anomalous results for Functions 
2 and 4 in the bottom half of Table 4 where the 
validities exceed somewhat the multiple Rs must 
presumably be laid either to random variation or, 
more likely in the case of Function 2 at least, to 
some failure in matching the subsamples. 
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is that the patterns of interfunction relation- 
ships in the two subsamples, while highly 
similar to each other—except perhaps for the 
pair (1, 4)—have once again become mark- 
edly dissimilar to the patterns that exist 
among the corresponding criterion measures! 
This writer confesses his total inability to 
account for this finding. Actually, it is the 
disparity between the goodness of the fit ob- 
tained with the RDCV analysis presented in 
Table 3 and the extremely bad fits obtained 
with the DSCV analyses presented in Table 4 
that is the crux of the enigma. That is, the 
fundamental question posed by these data 
seems to be the following: If the regression 
functions of the RDCV analysis, despite their 
being based in part on biased estimates of 
certain of the correlations and variances, were 
so successful in suppressing the joint-keying 
artifacts that characterize the correlations 
among the simple empirical keys; how is it 
that when regression functions were developed 
from a uniformly unbiased set of correlations 
and variances, as they were in the DSCV 
analyses, that scores based on the resultant 
functions are intercorrelated in such an in- 
congruous manner vis-a-vis the criteria? 

Any attempt to cope with this quandary 
must take into account several additional 
findings of these analyses. The first is that, 
although the average discrepancy of the DSCV 
interfunction correlations from the correspond- 
ing intercriterion values is about equal to the 
average discrepancy for the simple summation 
scale correlations, and although there is some 
gross similarity between the obtained patterns 
in the two cases, these two patterns are by no 
means sufficiently congruent to consider the 
DSCV interfunction correlations simply as 
“reversions-to-type.” 

As a second consideration, a comparison 
of the regression weights based on the RDCV 
analysis with those from the DSCV analyses 
would have led one to expect a far closer 
correspondence to the summation scale corre- 
lations for the former set than the latter. The 
basis for this expectation is that in the RDCV 
functions, the beta weight for the particular 
predictor scale corresponding to the given 
criterion is relatively much larger in each case 
than are the weights for the other four scales. 
But in the DSCV analyses, the magnitudes 
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of the weights in any given function are much 
more uniform in size, and in a few cases, 
predictors other than the primary one for that 
criterion have beta weights that even exceed 
in magnitude the weight for the presumed 
primary. 

And finally, it seems inconceivable to this 
writer at least, that these incongruous find- 
ings could be simply the result of random 
variation in the data. In the first place, the 
patterns of correlations among the criteria 
and among the predictors, and the estimates 
of other parameters such as means and vari- 
ances on the input side are highly similar 
across the several independent samples of Ss 
employed. In addition, both the relatively 
large samples employed for the various anal- 
yses and the high degree of congruity ob- 
tained on the output side, even for those 
results based on the smallest subsamples, all 
would seem to auger rather strongly against 
any simple dismissal of the quandary on 
grounds of imprecise estimation or random 
error. 

The only potential basis for an explanation 
that has occurred to this writer is that the 
congruity of the RDCV and intercriterion 
patterns here obtained might have been 
simply an accident—the result of compensat- 
ing nonrandom artifacts or “methods factors” 
that just happen to be present in the partic- 
ular measures employed. If this is so, and if 
in general, contrary to the initial assumption, 
no basis actually exists for expecting multiple- 
regression functions developed separately 
against each of several criteria to be them- 
selves interrelated in the manner in which the 
criteria are, then the results would not be so 
surprising even though they would remain no 
less disappointing. 

What would seem to be needed if this be 
the case, however, is not the rejection of the 
logic of the DSCV design, but rather the 
application of a more appropriate, though 
surely more complicated, statistical analysis 
method. That is, analysis procedures would 
seem to be required that would estimate vec- 
tors of weights for the predictors that would 
not only minimize overall errors of estimate 
simultaneously for the several criteria, but 
would do so subject also to the restraint of 
maximizing the fit of the product moments 
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among the function score variables to those 
among the criteria. This writer, unfortunately, 
knows of no such generalized analysis method 
currently available nor even whether it is 
theoretically possible to construct one for the 
conditions specified. That is, it may just be 
that the stated conditions (like those for 
complete simple structure in the oblique case 
of factor analysis) are not always simul- 
taneously satisfiable. 

In presenting the above arguments and 
data, the present writer has not been intent 
to pose any sort of irresolvable dilemma or 
paradox. To the contrary, although I am 
frankly at a loss to explain to my own satis- 
faction the quandary these findings have 
inflicted on me, my faith persists in the basic 
logic and virtues of the DSCV design and in 
the possibility of some consistent explanation 
for the empirical results obtained. Indeed, it 
is in the hope of enticing others to attempt a 
resolution of these seemingly recalcitrant 
findings that they have been presented in 
such detail. And should such an explanation 
imply the need for an analysis method of the 
general sort briefly outlined above, I would 
hope that those with greater acumen and 
sophistication in such matters than I possess, 
would consider the task a worthy challenge. 
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LETTER DIFFERENTIATION AND RATE OF 
COMPREHENSION IN READING* 


E. C. POULTON 
Applied Psychology Research Unit, Cambridge, England 


375 adults were given 90 sec. to read passages of about 450 words printed in 
1 of 7 typefaces equated for size. They had then to answer 10 open-ended ques- 
tions on the content. Of the typefaces without serifs Gill Medium, the letters of 
which were judged by typographical experts to be fairly strongly differentiated, 
was comprehended reliably faster than Grotesque 215 and 2 versions of Uni- 
vers, in which the letters were judged to be less well differentiated (p < .05). 
There were no reliable differences between the serif typefaces, Bembo an old 
style, Baskerville a transitional, and Modern Extended Number 1, nor between 


the serifed and sans-serifed typefaces. 


The first part of the experiment compared 
the rate of comprehension provided by some 
typefaces without serifs, or the expanded ends 
of the vertical strokes of the letters. The faces 
were ranked by typographical experts accord- 
ing to the degree of character differentiation. 
They ranged from Gill Medium in which the 
letters were fairly well differentiated, through 
Grotesque 215 in which there was less differ- 
entiation, to two versions of Univers in which 
all the letters bore a strong family resem- 
blance. No systematic objective comparison 
of the readability of text set in sans-serif 
typefaces appears to have been made previ- 
ously. 

The second part is a sequel to a recent ex- 
periment in which it was found that a modern 
typeface, Modern Extended Number 1, was 
comprehended more rapidly than an old style, 
Imprint (Poulton, 1959). This result is con- 
trary to that of Paterson and Tinker (1932), 
who found no reliable differences between any 
style of typeface in common use. The greater 


1 The author is grateful to the Monotype Corpora- 
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periment; to R. Conrad and A. J. Hull for the re- 
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Council of Industrial Design for suggesting the ex- 
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through, and for the typographical comments; to P. 
Freeman for statistical help; and to the British Medi- 
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sensitivity of the recent experiment may have 
been due partly to the use of a more homo- 
geneous population of subjects: Cambridge 
scientists, as contrasted with the Minnesota 
students used by Paterson and Tinker. In 
which case some of the increased sensitivity 
may have been obtained at the cost of a loss 
in generality. The present experiment used a 
more representative sample of subjects (Ss), 
and compared Modern Extended Number 1 
with Bembo, an old style, and with Basker- 
ville, a transitional style. The design of the 
two parts also allowed a comparison to be 
made between the sans-serif typefaces of the 
first part, and the serif faces of the second 
part. 

In order to find reliable differences between 
good designs of lettering, it is necessary to 
make the experiment as sensitive as possible 
(see Poulton, 1965). This has been done first 
by setting a time limit for the reading and 
measuring comprehension, so that differences 
in rate of reading and degree of comprehension 
while reading both affected the single measure 
of performance used. Second, the time allowed 
was fixed to produce an average score for 
comprehension of 50%, so that relatively 
small changes in difficulty would have the 
greatest chance of revealing themselves. Third, 
the size of type, the amount of leading be- 
tween the lines, and the length of line were 
chosen to be as nearly optimal as possible 
(Poulton, 1960; Tinker, 1963), so that the 
only nonoptimal parameter, if any, was the 
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GILL MEDIUM 


Yellow fever is a disease of warm lands which occurs mainly along the 
shores of the Atlantic Ocean. It was first recognized in the Americas, 


Gke@tm215 


Yellow fever is a disease of warm lands which occurs mainly 
along the shores of the Atlantic Ocean. It was first recognized 


UNIVERS 


Yellow fever is a disease of warm lands which occurs mainly 
along the shores of the Atlantic Ocean. It was first recognized 


UNIVERS 


Yellow fever is a disease of warm lands which occurs mainly along the 
shores of the Atlantic Ocean. It was first recognized in the Americas, 


BE MBO 


Yellow fever is a disease of warm lands which occurs mainly along the 
shores of the Atlantic Ocean. It was first recognized in the Americas, but 


BASKERVILLE 


Yellow fever is a disease of warm lands which occurs mainly along 
the shores of the Atlantic Ocean. It was first recognized. in the 


MODERN 


Yellow fever is a disease of warm lands which occurs mainly along 
the shores of the Atlantic Ocean. It was first recognized in the 


Fic. 1. Examples of the typefaces. 
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le of type. Finally, each S read two pas- 
Zes in the typeface he was being tested on, 
that by the time he came to the second 
ssage he would be reasonably accustomed 
‘that face. 

Two other possible methods of increasing 
sitivity were not employed. One is to give 
th S passages printed in two different styles, 
1 to examine the difference in score be- 
zen the styles, like Paterson and Tinker 
932). The difficulty here is that perform- 
>€ on a passage printed in the second style 
y be influenced by the style of print read 
t (Fox, 1963; Poulton, 1960). The other 
ected method, already referred to, is to 
trict the experiment to a relatively -uni- 
m population of Ss. 


METHOD 


Materials. Two passages, each of about 450 words, 
were selected from Fry’s Reading Faster: A Drill 
Book (1963). The vocabulary of these passages is 
restricted to the 2,000 commonest words in the lan- 
guage (Thorndike & Lorge, 1944). The first passage 
described the conquest of yellow fever, the second 
passage described field research on virus infections. 

Both passages were printed in the seven typefaces 
shown in Figure 1 and listed in Table 1. In compar- 
ing different designs of letter it is necessary to hold 
size constant, or differences caused by size will be 
confounded with differences in design. In the present 
context size was taken to mean x height, or the 
height of the rounded part of the letters excluding 
the ascenders and descenders. Table 1 shows that this 
was done as far as the availability of sizes of type 
would allow. With Univers the x height was such a 
large proportion of the total point size that an x 
height of 1.6 millimeters, which was required to 
match the other x heights, meant a point size of only 
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9.5, as compared with the point size of 10—12 of 
the other styles. It was therefore decided to include 
two samples of Univers, one matched for x height 
and a second matched for point size. 

In addition, it is desirable to control as far as 
possible for the amount of paper covered by each 
typeface. Since the length of line was fixed at 11 
centimeters (24 ems), this meant equating the vertical 
length of the passages, allowing for partly blank lines. 
With Bembo this was done by adding an extra half 
point of leading between lines, so that the sum of 
point size and leading was 13.5 points, as compared 
with the norm of 13 points. With Grotesque 215 and 
the 10.5-point (10-didot) Univers, equating on length 
of passage would have meant reducing the sum of 
point size and leading from 13 points to 12.5 and 
11.5, respectively, and for typographic reasons this 
was unrealistic. With the 9.5-point (9-didot) Univers 
4.5-point leading would have been necessary to reach 
the norm, and for the same reasons it was considered 
unsuitable to use more than 3 points of leading. 

The 10 multiple-choice questions set by Fry on 
each passage to test for comprehension were changed 
to 10 open-ended questions which could be answered 
in a few words. Rather general evaluative questions 
were replaced by additional more factual questions. 
The 10 questions were spread evenly over the whole 
text, so that a person who had read 80% of a passage 
would be able to attempt 8 of the 10 questions. 

Procedure. The experiment was conducted on 
groups of volunteers. Each individual read a differ- 
ent typeface from those of his close neighbours. 
Both his passages were printed in the same style, and 
the same passage was always read first. A pilot ex- 
periment on 30 volunteers at a Post Office training 
center had suggested that an average comprehension 
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score of about 50% would be obtained if the tim 
allowed for reading each passage were restricted t 
90 seconds. The experimenter timed the 90 seconds 
calling out at half time and again when 20 second: 
were left. No time limit was set for answering thi 
questions, but no one was allowed to start reading 
the second passage until everyone had finished an. 
swering the questions on the first. The experimen 
took altogether about 20 minutes. 

At the bottom of the duplicated sheet carrying the 
questions the volunteer was asked to indicate whethe! 
in the time allowed he had read the passage througl 
twice or more, more than once through, just once 
through, or only partly through. An additional an. 
swer sheet at the end of the experiment asked thi 
volunteer to indicate his familiarity with the topics 0: 
the passages, and also his age, sex, occupation, anc 
whether or not he wore glasses for reading. 

Calculations. The results of the volunteer wh« 
scored zero on both passages were discarded. Separat 
analyses of variance were carried out for the sans: 
serif and serif typefaces. Inspection of the results o: 
the sans-serif typefaces suggested that it was Gil 
Medium which was responsible for the reliable vari 
ance ratio shown by the typefaces. The variance at. 
tributable to typefaces was therefore split into ; 
comparison of Gill Medium with the average of th 
other three sans-serif typefaces and a remainder, anc 
the former was tested against the residual. The ; 
value associated with this variance ratio was mul 
tiplied by 4 because Gill Medium was one of fow 
sans-serif typefaces. Two-tailed tests have alway 
been used. 

Experimental subjects. About two-thirds came fron 
the Post Office training center. The remainder, main] 
housewives, were from Cambridge. The same propor 


TABLE 1 


TYPEFACE AND COMPREHENSION 














Average 
Leading length of Average 
Monotype series x height between passage Number compre- 
Typographical Point (milli- lines (centi- of hension 
variable Number Name size meters) (points) meters) readers (percentage 
Without serifs 
Letters fairly well 262 Gill Medium 11 1.6 2 17,2 53 56.64 
differentiated 
Intermediate 215 Grotesque 10 1.7 3 18.6 54 47.1 
Letters less well 689 Univers 10.5 1.9 25 19.7 53 47.5 
differentiated 689 Univers 9.5 1.6 3 15.3 53 45.2 
With serifs 
Old style 270 Bembo 12 1.6 3 1753 54 49.0 
Transitional 169 Baskerville 11 15 2 17.4 53 49.3 
Modern style- 7 Modern Extended 11 1.5 2 17.4 54 53.4 
Number 1 


See a 


a Gill Medium reliably different from other typefaces without serifs (b < .05). 


COMPREHENSION IN READING 


ons of these two main groups read each of the 
ven typefaces. 


RESULTS 


The mean scores for comprehension on the 
cond passage are shown in the last column 
' Table 1. There were no reliable differences 
1 the first passage (p > .05), which familiar- 
ed the Ss with the procedure and the par- 
cular typeface they had to read. On the 
cond passage reliable differences were found 
ithin the sans-serif typefaces (p < .05 on 
ialysis of variance), Gill Medium being re- 
ably the best. This result cannot have been 
ie to a simple difference in point size or x 
ight, since Gill Medium and Grotesque 215 
ere equated as far as possible. And the 
rger size of Univers was equated to Gill on 
int size, while the smaller was equated on 
height. 

The difference between Modern Extended 
umber 1 and the old style Bembo was in 
vour of Modern, but was not reliable. None 
the serif faces were reliably different from 
iy of the sans-serif faces. 

The reported rate of reading was unrelated 
th to typeface and to age (p > .05). Over 
'% of the volunteers read at a rate below 300 
rds per minute. On comprehension the 
ider 20s averaged 57%, those between 20 
id 39 averaged 52%, and those 40 and over 
eraged only 41% (p< .001). The propor- 
ms of these three groups which wore glasses 
r reading were, respectively 15%, 25%, and 
\%o. The 142 Ss from Cambridge were given 
ipley’s (1940) tests of abstraction and 
cabulary directly after the experiment. The 
oduct-moment correlation of the two tests 
th comprehension was .45 for abstraction 
d .30 for vocabulary (f < .001 in each 
se). 


Discussion 


Commenting on the sans-serif typefaces 
typographical experts, Cheetham and 
‘imbly (1964) wrote: 


ie most striking difference between Gill Medium, 
otesque 215 and Univers lies in the shaping of 
lividual characters. If the characters of a typeface 
ve too much of a family resemblance, its words will 
© tend to look inconveniently alike. If we examine 
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Gill Medium and compare it with Grotesque 21S ait 
can be seen that Gill, with its geometrical approach 
allied to humanistic letter forms, has a stronger 
character differentiation than 215, which follows the 
traditional advertising sans serif where there is a 
tendency for the characters to become similar (look 
at the g, R, p, r and e of Gill). And even in 215; 
despite the face’s marked disadvantages (its bad set, 
various features of bad cutting, etc), there is still a 
greater differentiation of characters than in Univers, 
where confusion could be caused by the width of 
the counters (the spacing between vertical strokes 
within letters) interacting with the letter-spacing 
(between letters). 


The reliably greater rate of comprehension 
given by Gill Medium than by the other sans- 
serif typefaces agrees with these stated differ- 
ences in letter discriminability. 

The rate of comprehension with the three 
serif typefaces came out in the order predicted 
from the previous experiment (Poulton, 1959), 
the modern style fastest, the old style least 
fast, but the differences were not reliable 
statistically. This leaves open the question of 
the generality of the results of the previous 
experiment, which used scientists as Ss. 

It was the sans-serif Gill Medium which 
produced the highest average rate of compre- — 
hension, although it was not reliably better 
than any of the serif typefaces. This sug- 
gests that it is not necessarily serifs, as has 
been claimed (Burt, 1959), which make type- 
faces readable. Paterson and Tinker (1932) 
also found a sans-serif typeface, Kabel Light, 
not reliably different from their serif faces: 
but in this case it came close to the bottom 
in rank order of reading efficiency, like 
Grotesque 215 and Univers in the present 
experiment. 
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PERCEIVED AND ACTUAL CHANGE IN JOB SATISFACTION ? 


EINAR HARDIN 


School of Labor and Industrial Relations, Michigan State University 


Analysis of identified questionnaire data collected from 196 office employees 
at the start and end of a 6-month period showed that change in overall job 
satisfaction as perceived at the end was a very poor, though statistically sig- 
nificant, proxy measure of change as computed from initial and terminal reports 
on levels of satisfaction. Perceived change in job satisfaction had zero regression 
on initial satisfaction but regressed very significantly on terminal satisfaction 
and on change in 14 job aspects-as perceived at the end of the period. The 
findings cast serious doubts on the usefulness of the quasi-longitudinal design 
in studies of the impact of technological and organizational changes upon job 


satisfaction. 


The psychological study of technological, 
ganizational, and other change phenomena 
uld often benefit greatly from the avail- 
vility of successive measures of attitudes 
1 the same group of individuals. When a 
anned experiment is not feasible, such a 
agitudinal design may be barred by several 
rcumstances. The would-be experimenter 
ay not know of the change event until 
has progressed far or already been com- 
eted. He may not be given enough lead 
ne to gain access to the research site and 
develop his design and instruments. The 
elihood of a researchable event occurring 
a given site within an acceptable period 
time may be too small to warrant the 
quired anticipatory expenditure of time 
d effort. 
‘Faced with such circumstances, the re- 
archer may have to choose between a cross- 
ctional design and a quasi-longitudinal de- 
m. The former design consists in comparing 
‘itudes at a given point of time in situations 
xere the change phenomenon took place with 
‘itudes existing in otherwise similar situa- 
ms in which the change did not occur. The 
‘ef weakness of this approach as compared 
th the genuinely longitudinal approach is 
at persistent situational characteristics af- 
‘ting attitudes often fail to cancel out. At 
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best they merely increase the unexplained 
variance and hence the standard errors of esti- 
mates. At worst, by enhancing or reducing 
the probability of the change event occurring 
in certain types of situations, they seriously 
bias the comparisons. 

A quasi-longitudinal design, in which retro- 
spective questions are asked about changes in 
the respondent’s attitudes, might seem like an 
ideal solution to the researcher’s problem. 
Many research workers have used this design 
and have by and large interpreted the findings 
as if they had resulted from a genuinely 
longitudinal study with successive measure- 
ments of attitudes (see, for instance, Faunce, 
1958; Hardin, 1960a, 1960b; Mann & Hoff- 
man, 1960), and similar use has been made 
in opinion polls, such as one conducted by 
Boyle (1961). While answers to retrospective 
questions may have phenomenological interest 
regardless of their relationship to actual 
change in attitudes, their use in the quasi- 
longitudinal design would be warranted only 
to the extent that they can be substituted 
without serious bias and loss of accuracy for 
genuinely longitudinal measures. Little work 
appears to have been done on the relation- 
ship between perceived change and measured 
before-after change. A decade ago Baum- 
gartel (1954) stated two findings: (a) change 
in supervisory behavior was perceived more 
often in an experimental group of employees 
whose supervisors had been given an intensive 
training program, and (0) perceptions of 
supervisory-behavior changes were associated 
with attitudes toward the corresponding be- 
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havior aspects. As commented by Hardin and 
Hershey (1960), 


The former finding was interpreted as a demonstra- 
tion that perceived change measures had some degree 
of validity as measures of actual change. However, 
the report left some doubt as to whether the percep- 
tions of change had their correlates in actual changes 
in supervisory behavior or were perhaps induced by 
the very knowledge that the supervisors in the 
experimental group had gone through a training 
program. The latter finding was taken to mean that 
attitudes influenced the perception of change, but 
the data presented also seemed consistent with the 
hypothesis that perception of change affected the 
attitudes. 


Hardin and Hershey analyzed the relation- 
ship between the 6-month change in pay 
(“much more now,” “more now,” “no 
change,” “less now,” and “much less now’) 
as reported by the respondents and the change 
in dollars of salary per week over the same 
period of time as shown in the employer’s 
personnel record. Actual and reported changes 
were associated significantly (p < .01), even 
though a very strict criterion of correctness 
was used: actual change was defined to mean 
any change at all in weekly salary, reported 
change was defined to mean any other re- 
sponse than “no change,” and a dichotomous 
classification was used. Reported change, 
therefore, had some degree of validity as a 
measure of actual change. But this validity 
was low because one-third to one-fourth of 
the respondents gave erroneous reports. The 
average frequency of change was furthermore 
biased downward, for failure to report change 
that had actually occurred was significantly 
more likely (35%) than reporting of change 
when none had actually taken place (18%). 
Errors of reporting were uncorrelated between 
two consecutive 6-month periods, which sug- 
gested the absence of persistent personal dif- 
ferences in accuracy. They were also unre- 
lated to errors in reports on current pay. 
Failure to report was more common when the 
actual pay raise was small than when it was 
large. Persons reporting increases in many 
other job aspects (see below) were more 
likely than others to report changes in pay, 
thus showing a smaller frequency of failure 
to report actual raises and a higher frequency 
of reporting raises that had not occurred. 
Satisfaction with pay, supervisor, and the job 
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as a whole were unrelated to accuracy of 
reporting. 

This paper analyzes the perception (report- 
ing) of change in overall job satisfaction 
over a 6-month period in comparison with 
overall satisfaction as existing and reported 
at the beginning and end of the period. The 
data source was the same as for the Hardin 
and Hershey study. 


RESEARCH SITE AND DATA 


Data used in this study were collected in 
two identified questionnaire surveys con- 
ducted 6 months apart in a medium-size 
casualty insurance company. The question- 
naires were designed to study the effects of 
an IBM 650 electronic data processing ma- 
chine, which the company installed and put to 
use in the interviewing period. The general 
job satisfaction question, “Taking everything 
into account, how satisfied are you with your 
job?” was asked identically in the two sur- 
veys, with the response categories “completely 
satisfied,” “very satisfied,” ‘quite satisfied,” 
“somewhat satisfied,” and “not satisfied.” 
Computations in this paper were based on 
assignment of values of 5, 4, 3, 2, and 1 to 
these response categories as ordered, 

The second survey also contained the 
question, “Considering everything, would you 
say you are now more satisfied or less satis- 
fied with your job than you were six months 
ago?” Values of 5, 4, 3, 2, and 1 were as= 
signed to the response categories “much more 
satisfied now,” “more satisfied now,” “no 
more, no less satisfied now,” “less satisfied 
now,” and ‘“‘much less satisfied now.” ‘_ 

Finally, the second. survey contained in 
checklist form a series of 14 questions con- 
cerning perceived changes in specific aspects 
of the job. The job aspects were described 
as follows: 


The amount of variety in my work 

The amount of work required on my job 

The degree of accuracy demanded by my job 
My control over the pace of my work 

The importance of my job for the company 

The amount of supervision I get on my job 

The amount of skill needed on my job 

The amount of responsibility demanded by my jo 
The amount of planning I have to do on my job 
The amount of judgment I have to use on my job 
The degree to which my work is interesting : 


CHANGE IN JoB SATISFACTION 


The amount of security I feel on my job 
My chances for promotion to a better job 
The amount of pay I get on my job 


For each of these items the respondent was 
asked the question, “How has this aspect of 
your job changed in the past six months?” 
Values of 5, 4, 3, 2, and 1 were assigned to 
the response categories “much more now,” 
“more now,” “no change,” “less now,” and 
“much less now.” A score was found by 
summing the values of the chosen responses 
for all 14 items. 

The following symbols were used in de- 
10ting the variables in this study: 


Xi = perceived change in job satisfaction 

X= job satisfaction reported in the second survey 

X= job satisfaction reported in the first survey 

X,=computed change in job satisfaction from 
the first to the second survey, where X4 = 5.0 
+ X2— Xs; 

Xs=score for perceived change in amounts of 14 
job aspects 


There were 246 persons who filled out both 
yuestionnaires, For this study, it was neces- 
sary to eliminate 24 respondents who failed 
'o answer | or more of the 17 questions used 
nn the analysis. Further, the fact that Likert 
‘tems are actually bounded variables made 
it desirable to eliminate 25 persons who 
*xpressed complete satisfaction in both sur- 
veys and 1 person who in both surveys was 
‘not satisfied.” Of the 25 persons, 8 reported 
hey were much more satisfied than 6 months 
sarlier, 6 said they were more satisfied, and 
1 reported no change. The 1 person who was 
‘not satisfied” at both times reported having 
»ecome much less satisfied. Had these 26 
»bservations not been omitted, the measured 
lationship of perceived change to computed 
‘hange in satisfaction (reported below) 
vould have been weaker and the relationship 
© current satisfaction would have been 
trengthened. 


FINDINGS 


Persons who differed in perceived change 
‘o job satisfaction also differed in computed 
hange, as shown in Table 1. The computed 
hange score was higher for persons who 
eported increases in satisfaction than those 
eporting no change or actual decreases. Not 
urprisingly, then, computed and _ perceived 
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TABLE 1 


PERCEIVED AND Computed CHANGE 
IN JoB SATISFACTION 








Computed 





change 
score 

Perceived change N M 
Much more satisfied now 16 Onl 
More satisfied now 53 Ee 
No more, no less satisfied now 84 DZ 
Less satisfied now 35 4.5 
Much less satisfied now 8 4.8 
All categories 196 OF 





Note.—F = 5,90, df = 4/191, p <.01. 


changes were correlated positively (r = 0.28, 
pb < .01). Classification of respondents on the 
basis of perceived change was in some measure 
also a classification according to computed 
change. 

The validity of perceived change as a 
measure of computed change was nevertheless 
limited. Regression of computed change 
linearly upon perceived change showed that 
respondents who differed by as much as three 
points in perceived change (with a total 
range of five points) could be expected to 
differ only by one point in computed change 
(with a total range of nine points). The 
standard error of estimate of computed 
change was also about one point. The two 
methods of classification were consequently 
poor substitutes for each other. 

Correspondence between the two classifica- 
tion schemes was even poorer, when the 
respondents were grouped into the discrete 
categories of increased, unchanged, and de- 
creased satisfaction according to the two 
criteria, as can be seen from Table 2. Only 73 
persons, or 37% of the 196 respondents, were 
classified alike by the two criteria, and 26 
persons, or 13%, were classified by one cri- 
terion as showing increased and by the other 
as showing decreased satisfaction. Statistically 
the two three-level criteria were approxi- 
mately independent of each other. 

The demonstrated poor correspondence be- 
tween the two criteria was not likely to have 
resulted from differences in procedure, for 
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TABLE.2 


DISTRIBUTION OF RESPONDENTS ACCORDING TO PERCEIVED AND COMPUTED CHANGE IN JOB 
SATISFACTION: A COMPARISON OF THREE-LEVEL CLASSIFICATIONS 








Computed change in satisfaction 





Increase No change Decrease Total 
Perceived change (6-9) (5) (1-4) frequency 
Much more or more satisfied 28 23 17 68 
now (5, 4) 
No more, no less satisfied 36 25 24 85 
now (3) 
Less or much less satisfied 14 20 43 
now (2, 1) 
Total frequency 73 62 61 196 





Note.—x? = 8.69,df = 4,p > .05. 


the two surveys contained identical questions 
about job satisfaction, were conducted in 
identical fashion, and naturally covered the 
same group of respondents. There was fur- 
thermore no apparent reason to believe one 
survey to be more reliable than the other. 
The effect of simple unreliability would there- 
fore be to reduce the statistical dependence 
of perceived change in job satisfaction upon 
both current and past job satisfaction, but 
the two regression coefficients or beta weights 
and the two partial correlation coefficients 
would be equal. Multiple-regression analysis 
showed, on the contrary, that perceived 
change in job satisfaction (X,) regressed on 
current job satisfaction (X2) positively and 
significantly but bore no relationship to job 
satisfaction 6 months earlier (X3), as indi- 
cated by the following results: 


X1s=1.60:1-.0 50N ai—20.04 Xige 


(0.070) (0.060) 
R? 2 021P dpa 103. Gis OMG: 
Biz = —0.039; 
SEs1 = SE po = 0.066 742.3 = 0.462; 
713.92 — —0.042. 


The test-retest correlation r23 = 0.251 was 
not high enough to affect seriously the credi- 
bility of these regression coefficients. Any 
observed zero-order relationship between per- 
ceived and computed change in job satisfac- 
tion should therefore be interpreted as re- 


flecting via the identity X, = 5.0 + Xo — Xs 
the strong relationship existing between per- 
ceived change and current satisfaction. Past 
satisfaction and computed change in satisfac- 
tion were therefore dropped as factors ex- 
plaining in part the perceived change in 
satisfaction. 

Separate analysis showed that the task of 
predicting computed change in job satisfac- 
tion (X4) was no easier when current satis- 


faction (Xe) and perceived change in satis- 
faction (X,) were both used than when only 


current satisfaction was used. For the partial 
regression coefficient b4; = 0.05 was not sta- 
tistically significant in the multiple regression 
equation 


X4 = 2.63 +0;,05X1 + 0.68Xe; 
(086)  (.091) 


R? = 0.28, df = 193 


and the value of the multiple correlation 
coefficient remained unchanged, when X, was 
dropped. 

Perceived change in job satisfaction was 
also found to be correlated with perceived 
change in individual aspects of the job. Per- 
sons who reported increased satisfaction with 
their jobs also tended to report that, as com- 
pared with 6 months earlier, there had been 
increases in the amounts of all job aspects, 
except the amount of supervision received on 
the job, which was approximately unrelated 
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to the other job aspects. The simple correla- 
jon coefficients ranged from 0.25 for amount 
of pay to 0.60 for amount of variety and 
).61 for degree of work interest, and their 
nedian value was 0.42. In contrast, the corre- 
ation with amount of supervision was only 
).07. The correlations between current job 
satisfaction and perceived change in job as- 
ects were positive and mostly significant. 
Although the perceived changes in job as- 
yects were predominantly in the direction 
referred by those affected (Hardin, 1960b, 
». 929), the correlations with current job 
atisfaction were usually much lower, how- 
ver, than with perceived change in job 
atisfaction. A score for perceived change 
n job aspects was accordingly computed by 
dding the 14 five-level Likert variables, and 
his score was included in variable X; in the 
egression equation. The fitted equation was 
ound to be 


R160 + 0.30X> + 0.08X;. 
(.059) — (.008) 


Re 0.47 .d} = 193, 


“he simple correlation between current job 
atisfaction (Xz) and the score on job aspect 
hange (X;) was positive and statistically 
ignificant (72; = 0.34, p < .01) but not large 
nough to weaken seriously the credibility of 
qe regression coefficients. Both of these being 
‘atistically very significant, perceived in- 
rease in job satisfaction varied with per- 
sived increases in the amounts of the job 
spects as well as with the level of current 
»b satisfaction. 

One independent variable and the depend- 
at variable were five-level single Likert items, 
ad the third variable was the sum of a 
amber of other five-level Likert items. Par- 
cularly the first two variables were very 
‘ude measures. Although the regression equa- 
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tion nevertheless explained about one half of 
the total variance in perceived change in job 
satisfaction, the introduction of additional 
variables seemed to impose more strain than 
the data could bear, and none were included. 


CONCLUSION 


Perceived change in job satisfaction is a 
poor predictor of computed change, and the 
quasi-longitudinal design seems very weak. 
Unless ways can be found to improve the 
validity of perceived change responses, a 
genuinely longitudinal design should probably 
be used, despite its cost and frequent mal- 
function, by research workers interested in 
actual changes in job satisfaction. For per- 
ceived change in job satisfaction reflects 
current, not past, job satisfaction, and it is 
closely related to perceived change in amounts 
of various job aspects, which bears only a 
limited relationship to computed change in 
satisfaction. 
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VERBAL CODING AND DISPLAY CODING IN THE ACQUISITION 
AND RETENTION OF TRACKING SKILL* 


DON TRUMBO, LYNN ULRICH, anp MERRILL E. NOBLE 


Kansas State University 


120 Ss were trained on a pursuit tracking task with an irregular step-function 
input. Cues for coding the task were introduced via pretraining and rehearsal of 
a numerical code and by display overlays in a 2 X 2X3 design. 3 levels of 
specificity of cues were provided by the overlays with the most specific condi- 
tion providing a numerical code like that of pretraining. The results showed 
that both pretraining and display coding facilitated early reduction of tracking 
error, but that neither these nor rehearsal of the numerical code affected re- 
tention performance after 1 wk. Taken together, these findings suggested that 
the verbal and display cues were used in the early coding of the task, but were 
less important later in practice and at retention. 


In learning a motor skill, two sets of cues 
are generally available to the subject (S); 
one set from the display and other external 
sources, the other set from the control move- 
ments, that is, the response-produced, or in- 
teroceptive, cues. It is widely held that the 
former are relatively more important to the 
early stages of skill development while the 
latter are more important to the finer adjust- 
ments and temporal-spatial sequencing of 
movements which characterize more advanced 
skill development (Fitts, 1951; Osgood, 1953; 
Fleishman & Rich, 1963, among others). A 
recent study by Fleishman and Rich (1963) 
supported this assumption by demonstrating 
that spatial-perceptual ability was predictive 
of proficiency early, but not later, in practice, 
whereas kinesthetic thresholds were predictive 
of later, but not earlier performance. 

Assuming that the S makes use of whatever 
cues are available, it may be expected that 
relative dependence on any set of cues as a 
basis for producing finer gradations of force, 
amplitude, direction, duration, or rate of 
movement will be a function of the relevance, 
reliability, and discriminability of the cues. If 
the external cues are ambiguous, noisy, or 
nonspecific, one may assume that greater de- 
pendence will be placed on response-produced 
cues. Similarly, to the extent that response- 
produced cues are enhanced by control dy- 


1 This research was supported by the Air Force 
Office of Scientific Research under Grant No. 526-64. 
Jane Quigley, William Griffitt, Jay Swink, and Mari- 
lyn Schaus provided valuable aid in data collection 
and analysis. William Hull served as project engineer. 


namics they would constitute a more depend- 
able basis for response differentiation (Bah- 
rick, Bennett, & Fitts, 1955). By the same 
token, display cues may be enhanced, as in — 
display magnification (Hartman & Fitts, 
1955) so as to serve as a basis for finer re- 
sponse differentiation. Thus, it would appear 
that either exteroceptive or interoceptive cues, 
or both, can serve as bases for coding a com- 
plex skill task. 

What is not clear, however, is the role of 
verbal and cognitive processes in the develop- 
ment of skill. While it is frequently assumed 
that such processes mediate between input 
and output, the development of such codes 
and their role in efficient skill acquisition or 
retention has received relatively little atten- 
tion. 

The present study was concerned with fa- 
cilitation of the coding of a skilled task both 
by increasing the specificity of display cues 
and by pretraining the Ss on a verbal code 
which, on a priori and theoretical grounds, 
was highly relevant to and compatible with 
the skill task. Thus, it was assumed that with - 
low display specificity response-produced cues | 
would play a relatively greater role in coding 
the task, whereas with intermediate specificity 
visual cues would be relatively more impor- 
tant, and with high specificity, which included 
a number-coding of target positions, = 


ing (verbal) processes would become rela- 
tively more important for response differentia- 
tion. In other words, three different task cod- 
ing processes should be facilitated by the 
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three levels of display specificity. Similarly, 
it was assumed that the verbal pretraining, 
which effectively gave the S a numerical code 
with which to encode the task, would encour- 
age a coding process similar to that of the 
high display-specificity condition. Such a code 
would appear to be the most efficient since it 
provides a highly specific system for labeling 
both stimuli and responses. 

It was predicted that with the tasks em- 
ployed here the benefits of both display speci- 
ficity and verbal pretraining, since they would 
provide guides to efficient coding of the task, 
would be apparent early rather than late in 
training when skill has developed sufficiently 
so that such aids to performance are relatively 
ineffective. It was also expected that the ef- 
fects of these variables would be additive, 
since the number code learned in pretraining 
would be most compatible with the number 
code of the high specificity condition. Finally, 
it was predicted that both variables would fa- 
cilitate retention of the skill, and, by the same 
rationale, that rehearsal of the verbal code 
just prior to recall would also enhance reten- 
tion performance. 


MeEtHop 
Subjects 


One hundred twenty undergraduates, right-handed 
males, between 17 and 26 years of age, served as Ss. 
They received research participation credit, money, 
or a combination of both for their services. 


Apparatus 


The Kansas State University Versatile Electronic 
Tracking Apparatus (VETA) was used as apparatus 
for this study. Since this system has been described 
in detail elsewhere (Trumbo, Eslinger, Noble, & 
Cross, 1963), only the display and control subsys- 
tems need to be reviewed here. 

The system consists of a punched paper-tape in- 
put with the pulses from the tape reader converted 
to analog voltages to drive the target, a 4-inch verti- 
cal hairline displayed on the horizontal axis of a 5- 
inch CRT. The position of a second (cursor) line 
was controlled by the S via an arm control consist- 
ing of a lateral arm rest, pivoted at the elbow, and 
an adjustable hand grip. The cursor appeared below 
che target line, and overlapped it by 4 inch. The 
vontrol-to-cursor ratio was 11.25 degrees to 1 inch, 
yr £22.5 degrees for the 4 inches maximum excursion 
of the target. 

Scoring was accomplished by an operational am- 
dlifier manifold which yielded momentary absolute 
ror and absolute error integrated over a trial. In- 
yut, control output, and momentary error were re- 
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corded during selected trials on magnetic tape, while 
integrated error was recorded by the experimenter 
(£) after each trial from a voltmeter. In the present 
study, momentary absolute acceleration and accelera- 
tion integrated by trials were also obtained by plac- 
ing an accelerometer on the underside of the arm 
rest. Length of trials, intervals between trials, and 
integration intervals were all automatically timed. 


Tasks 


Inputs for all experimental conditions were irregu- 
lar step-functions presented at one step per second. 
Targets were programmed to appear at any of eight 
equidistant positions on the middle 4 inches of the 
horizontal axis of the 5-inch CRT, with 48 steps 
constituting a trial. Trials were separated by 12- 
seconds rest. A 2-second warning buzzer preceded 
each trial. 

For each group, the task was pursuit tracking a 
fixed sequence of 12 target positions (steps) ran- 
domly drawn from the 8 possible positions with the 
restriction that no target could immediately repeat. 
This sequence was repeated four times per trial, 
without interruption, and throughout all trials. Four 
different sequences, essentially equal in total dis- 
tances traveled by the target and standard deviations 
of these distances, were randomly assigned within 
each experimental condition. 


Experimental Variables and Design 


Display specificity was varied by means of clear 
transparent display overlays. For low specificity, an 
overlay without markings was used. Intermediate 
specificity was achieved by engraving eight 1-inch 
vertical hairlines, corresponding to the eight possible 
target positions, on an otherwise identical overlay. 
For high specificity, ss-inch numbers from one (left) 
to eight (right) were engraved immediately above 
the eight hairlines used for the intermediate condi- 
tion. 

Verbal pretraining and rehearsal involved identical 
procedures, except for their positions in the training- 
retention sequence. In both cases, the S was seated 
in a room adjacent to the tracking room and pre- 
sented with a list of 12 numbers, ranging from 1-8, 
printed in a column on a 3 X 5 card. This list corre- 
sponded to the sequence of 12 targets in the tracking 
task. The S was required to learn the list by the 
whole method with a test on every fifth trial. After 
one errorless repetition of the list, the S overlearned 
the sequence by making 15 additional repetitions. 
After pretraining (or rehearsal), the S was taken to 
the tracking room and seated in the control chair. 

The experimental design was a 2 X 2 X 3 factorial 
with pretraining versus no pretraining, rehearsal 
versus no rehearsal, and 3 conditions of display speci- 
ficity as the main effects. Ten Ss were randomly as- 
signed to each of the 12 conditions. 


Performance Measures 


The principal performance criterion was absolute 
error integrated by trials, Integrated absolute ac- 
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celeration scores were also obtained, as were measure- 
ments of 12 indices obtained from oscillographic rec- 
ords of continuous target and response functions. 
These were obtained for a sample of trials and Ss 
in an attempt to determine some of the factors that 
produce differences in integrated error scores between 
best and poorest trackers and between learning and 
retention trials. The more interesting results of these 
analyses will be discussed. 

Finally, in order to obtain some indication of the 
degree to which the Ss had explicitly coded the task 
and could reproduce the sequence of target positions 
independent of the tracking situation, a paper and 
pencil test (PPT) was devised. The PPT consisted 
of 12 circles printed in 4 rows and 3 columns with 
8 lines, analogous to the overlay grid lines, on each 
circle. Immediately following the last training trial, 
and again preceding the first retention trial, all Ss 
were instructed to reconstruct the sequence, by 
checking 1 of the 8 lines in each circle, in order. 


Procedures 


On his arrival (or after pretraining for the Ss re- 
ceiving that treatment) the S was seated in the con- 
trol chair with the scope face covered. The cover 
was removed briefly to identify target and cursor 
blips, then replaced to avoid implicit rehearsal by 
the Ss who had pretraining. Identical and rather de- 
tailed instructions were then given to all Ss, includ- 
ing a description of the task, the method whereby 
their performance was to be scored, and identification 
via sample oscillographic records of principal sources 
of error (e.g., lags, overshoots, slow primary move- 
ments, etc.). The efficiency of anticipation and rapid 
primary movements was also demonstrated from 
sample records. All Ss were informed that they 
would track a fixed sequence of 12 targets which 
would be repeated within and throughout all trials. 
Finally, pretraining Ss were told that the sequence 
of numbers learned in pretraining corresponded to 
the sequence of targets, and were asked to repeat the 
sequence aloud once more. All Ss were then given 15 
trials followed by 25 trials on each of the succeeding 
4 days, or 115 trials in all. The Ss returned after 31 
days (+2 days) for 20 retention trials. 
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Knowledge of results was available to the S after 
each trial. Integrated error, displayed on a voltmeter 
located out of the S’s line of vision, could be read 
between trials. The S was shown how to read the 
meter and urged to do so, but between trials only. 


RESULTS 


Integrated absolute error scores for all 
conditions in both acquisition and retention 
are shown in Figure 1. Data for acquisition 
are in blocks of five trials, while the first four 
retention trials are presented individually, 
first for the rehearsal then for the no-rehearsal 
conditions. The remaining retention trials 
were omitted from Figure 1 since it was quite 
obvious that there were no differences among 
the experimental conditions after Trial 4. 

The results indicate that during early ac- 
quisition, there were differences in perform- 
ance which were systematically related to 
both pretraining and display-specificity con- 
ditions. Pretraining resulted in a reduction in 
tracking error and within both pretraining and 
no-pretraining conditions there was a consist- 
ent ordering of performance for the three dis- 
plays, with the grid display resulting in poor- 
est performance, the numbered display best 
performance, and the blank display intermedi- 
ate performance. It is also apparent that all 
groups converge on a common asymptote in 
the latter stages of training. 

Analyses of variance were performed on 
the mean data for Blocks 3, 4, and 5, and 
Blocks 21, 22, and 23. The results are sum- 
marized in Table 1. They indicate that both 
main effects, but not the interaction, were sig- 
nificant for the early trials, while no signifi- 
cant sources of variance were found for the 


TABLE 1 


SUMMARY OF ANALYSES OF VARIANCE FOR Biocks 3-5 AND 21-23 oF TRAINING TRIALS 














Blocks 3-5 Blocks 21-23 
Source df SS MS F SS MS F 
Pretraining (P) 1 15.29 15.29 elas 19 19 85 
Display (D) 2 5.48 2.74 3.08* .02 01 03 
PXD 2 223 mitt a3 18 .09 38 
Error - 114 101.50 89 25.99 | 0%} 
Total 119 
Kp < .05 
ED << 201 


a 
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later trials.) However, none of the adjacent 
pairs of means differed significantly, even in 
the early trials (Isd tests: p > .05). 

It will be noted that contrary to expecta- 
tion, the grid display resulted in poorer per- 
formance than the blank display. This re- 
versal of predicted results will be discussed 
at a later point. 

To evaluate the effects of the 1-month re- 
tention interval on performance, a # test was 
performed on the mean of the differences be- 
tween the last block of training trials and the 
initial retention trial for Ss pooled across all 
conditions. The ¢ was highly significant (¢ = 
9.38, p< .001), indicating that a reliable 
loss in skill occurred during the retention in- 
terval. 

A further analysis of variance was made 
using the difference between Block 23 during 
acquisition and the first retention trial for 
each individual. Main effects were pretraining, 
display specificity, and rehearsal. This analy- 
sis was designed to determine which, if any, 
of the treatment effects accounted for differ- 
ential losses from training to retention. Since 
no significant differences existed among groups 
at the end of training, any significant con- 
tributions of either the main effects or their 
interactions to the variance of these difference 
scores would indicate the role of these vari- 
ables in the retention of skill. However, none 
of the main effects nor their interactions were 
significant. Thus, the analysis failed to iso- 
late differential affects on the retention of 
skill attributable to any of the experimental 
variables. 
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Paper and Pencil Test 


The PPT was administered immediately 
after the last training trial and again before 
retention (or rehearsal) was begun. Error 
scores for reproduction of the sequences 
ranged from 0-12 with an arbitrary score of 
8 assigned to those cases which reproduced 
portions of the sequence but had these mis- 
placed in the total sequence of 12. 

The frequencies of errors, by experimental 
conditions and test sessions, are presented in 
Table 2. It is apparent that Ss in all condi- 
tions were less able to reproduce the target 
sequence at retention than immediately after 
training. Median tests indicated that the in- 
creases were significant in all cases. Further- 
more, pretraining served to improve perform- 
ance on the PPT for both blank and grid dis- 
plays, but not for the numbered display. In 
the latter case, errors were highly infrequent 
regardless of pretraining conditions. Display 
specificity was also correlated with PPT per- 
formance, with Ss on the numbered display 
committing significantly fewer errors, regard- 
less of pretraining, than Ss on either the blank 
or grid display. Thus, both learning and re- 
tention of a code of the target sequence were 
facilitated by both pretraining and the num- 
bered display. However, PPT scores within 
each of the acquisition groups were not sig- 
nificantly related to integrated error scores at 
the end of training, using Spearman rank cor- 
relation. On the other hand, for the three re- 
tention groups with neither pretraining nor re- 
hearsal (i.e., Ss whose performance was not 
altered by rehearsal) PPT scores and losses in 
tracking scores were found to be correlated 
(for ND, R = .73*; for.GD and BD, Rs = 


TABLE 2 


TorTaL Errors, BY GROUPS AND By TEST SESSIONS, ON THE PAPER AND PENCIL TEST 








Pretraining condition 


No pretraining condition 





BD* GD ND Total BD GD ND Total 
Posttraining 13 5 0 18 70 63 3 136 "Sy 

(6%) (2%) 0%) (3%) (29%) (26%) = (1%) ~— (18%) j 
Prerecall 33 35 15 83 147 117 52 316 

(14%) (15%) (6%) (12%) (61%) (49%) (22%) (44%) 


8 Display conditions: BD = Blank, GD = Grid, and ND = Numbered display. 
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TABLE 3 


SCORES FOR HIGH-RETENTION AND LOW-RETENTION SUBJECTS ON TEMPORAL AND 
AND SPATIAL INDICES OF PERFORMANCE 














Error Leads® Beneficial anti- Acceleration 

(Volts) (percentage) cipations> On-Target® olts 
High Low High Low High Low High Low High Low 
Trial Ret. Ret. Ret. Ret. Ret. Ret. Ret. Ret. Ret. Ret. 
|. Last training Zell 1.9 97.6 95.6 ote 36.8 32.2 34.0 4.1 4.4 
2. First retention 2.1 Do 95.6 81.1 37.5 30.9 8257) 24.0 4.4 eli 
Change (2-1) .O +1.4 — 2.0 —14,.5 + 2.3 — 5.9 + .5 —10.0 + .3 +1.3 

t (Change scores) 13.3** 3.06** 4,03** 5.43** 2.48* 





® Mean time, in milliseconds, by which response preceded the target displacement. 
> Number of primary movements initiated but not terminated before target displacement. 
¢ Number of targets held more than .6 seconds within +-.167 inches tolerance. 


*p <.05. 
ape OL, 


32 and .04, NS). These findings suggested 
that while terminal training performance could 
10t be predicted from the PPT, retention of 
the ability to reproduce the sequence of tar- 
zets was predictive of performance at recall. 


4nalytic Scores 


A sample of oscillographic records of con- 
Inuous target and response functions was 
scored for 12 indices of temporal and spa- 
jal accuracy in an attempt to isolate factors 
hat determine differences in integrated error 
scores. The 24 Ss, 2 from each condition, with 
he least losses in retention were compared 
vith the 24 Ss with the greatest losses on the 
ntegrated error criterion. These two groups 
vere not different at the end of training, but 
vhereas the better Ss showed zero loss, the 
»oorer Ss had a mean loss of roughly 40% of 
he original gain. 

Table 3 summarizes the results for four 
ndices which differentiated the better Ss dur- 
ng retention. High-retention Ss, in general, 
hhowed better retention on temporal indices 
han did the poorer Ss. The better Ss had 
mly a 2% loss in leads * (from 97% to 95% 
if all targets), while the poorer Ss lost over 
4% (95% to 81%, p< .01). Similarly, 
vhile the better Ss showed a slight increase in 
veneficial anticipations at retention, poorer 
's had approximately a 17% loss on this tim- 
og index. Furthermore, best Ss maintained 
heir frequency of “on target”? scores whereas 
wor Ss had a 30% decrease in this category. 


2These indices are defined in the footnotes to 
‘able 3. 


Poorer Ss failed to compensate for these losses, 
despite the fact that they appear to have been 
moving the control even faster than in train- 
ing, as indicated by the integrated accelera- 
tion scores. (The latter may reflect more 
quick corrective movements, however.) Other 
indices scored but not included in Table 3 
were, percentage and mean amplitude of over- 
shoots and undershoots, frequency of direc- 
tional errors, and number of anticipations 
within + 150 milliseconds of target displace- 
ment. None of these indices differentiated be- 
tween best and poorest Ss during retention 
trials. 


Discussion 


The major findings of this study were that 
both pretraining on a verbal code and increas- 
ing the specificity of the display code improve 
early performance of a motor skill. However, 
on the display variable there was a reversal 
of expected effects with respect to low and 
intermediate specificity conditions. While the 
difference between these two conditions was 
not significant, some explanation of the re- 
versal was sought. 

It appeared that, rather than facilitating 
the coding process by increasing the discrim- 
inability of the display information, the grid 
lines on the intermediate display served as a 
source of interference or distraction. This may 
indicate that while Ss with the numbered dis- 
play quickly learned the perceptual task and 
moved rapidly on to attending more com- 
pletely to proprioceptive cues, Ss with the grid 
display may have first attended to and at- 
tempted to code the task in terms of the 
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lines, but, finding this an inefficient method, 
shifted to attending primarily to propriocep- 
tive cues. Grid display Ss without pretraining 
had difficulty with the PPT and were fre- 
quently observed to simulate control move- 
ments, apparently in an attempt to reproduce 
response cues, aS an aid to completing the 
test. This suggested that a number code 
probably was not used by these Ss. 

In this connection, studies with both verbal 
pretraining and stimulus predifferentiation 
sometimes report inhibitory effects (Battig, 
1954; Battig, 1956; Battig, Hoffeld, Seiden- 
stein, & Brogden, 1957; Hoffeld, 1957). 
Battig (1956) suggested that when pretrain- 
ing (and, presumably, stimulus predifferentia- 
tion) adds relevant new cues the effect will be 
beneficial, but when the added cues are ir- 
relevant or redundant, interference may re- 
sult. Therefore, in the present study, the lines 
may have been “attractive distracters,” sug- 
gesting an inefficient coding process. For the 
pretraining Ss the set almost certainly was to 
assign numbers to the lines. The no-pretrain- 
ing group may well have attempted a similar 
strategy but with the added handicap that 
they had not memorized the numbers corre- 
sponding to the sequence. 

The prediction that display specificity and 
pretraining effects would be additive was not 
clearly supported, since the interaction be- 
tween these variables was nonsignificant. 
Again, the concepts of irrelevancy and redun- 
dancy of cues may suggest an explanation. In 
retrospect, it appears that for Ss with the 
numbered display, pretraining was largely 
redundant for coding the task. Either pre- 
training or numbered display provided the 
same efficient code; with pretraining the code 
learning was accomplished before skill learn- 
ing began; whereas with the numbered dis- 
play but no pretraining code learning had to 
occur during skill training. 

The assumption that the effects of pre- 
training and display specificity would be 
apparent only early in training were born 
out, first by the significant findings for Blocks 
3 through 5, then by the nonsignificant find- 
ings and the obvious converging of the curves 
for Blocks 21 through 23. While these results 
fail to provide a test for the assumption that 
exteroceptive (and cognitive mediating) cues 
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give way late in training to response-produced 
cues, they are consistent with such a position. 

No significant effects of either pretraining 
or display specificity on retention were found. 
It was assumed that these additional cues 
would enhance retention of the skill; however, 
it may be, as suggested above, that Ss had 
progressed to a point where these cues, while 
valuable in the early coding process, were 
discarded in favor of proprioceptive cues late 
in training, and that the latter, not the 
former, were most relevant to recall per- 
formance. Given a longer retention period 
and possible regression to earlier performance 
levels, coding cues may again become impor- 
tant for skill performance. If this is found to 
be the case, it will be important to explore 
further the role of various cues in retention 
as well as during acquisition of skill. 


The ineffectiveness of verbal rehearsal may — 


be explained in similar terms. First, there is 


little evidence from either the PPT or the ~ 


analytic scores that the sequence of targets 
was forgotten, particularly by Ss who had 
either pretraining or the numbered display. 
While some decreases occurred in the per- 
centage of targets for which Ss were leading, 
Ss continued to lead more than 88% of the 
time on the first retention trial, making very 
few directional errors in the process. 


Post hoc inspection of Figure 1 suggested . 


that the rehearsal treatment did result in 


greater variance in retention performance 


than no rehearsal. An F test of variances of 
the two sets of group means for Trial 1 of 
retention confirmed this finding (F5, 5 = 5.75, 
p< .05), suggesting that verbal rehearsal 
may interact with both pretraining and dis- 


play specificity, resulting in interference with — 


recall performance under some conditions 
(apparently in no-pretraining, low and inter- 
mediate specificity conditions) and facilitation 
in others (high specificity, with and without 
pretraining). Presumably, it was in the latter 
cases that an explicit number code was used. 
These results require further verification. If 
they represent reliable phenomena, they might 
well be more apparent with longer retention 
periods and, consequently, greater evidence 
of performance loss. 

Finally, the comparison of measures of 
temporal and spatial indices of response for 
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est- and poorest-retention Ss indicated that 
osses on the integrated error criterion were 
ssociated predominantly with losses in 
iming rather than spatial accuracy. These 
esults agree rather well with those in a 
revious study (Trumbo, Noble, Cross, & 
rich, in press) which also identified tempo- 
al aspects as the more susceptible to losses 
ith no practice. 
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DOGMATISM AND PREDECISIONAL INFORMATION SEARCH* 
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Rokeach’s Dogmatism Scale and 4 decision measures of tendencies to reserve 
judgment were administered to 72 freshmen women. A significant negative 
relationship was found between dogmatism and each of the 4 decision measures. 
The nondogmatic individual tended to delay decision and engage in pre- 
decisional search, to require more time for psychophysical judgments, and to 
respond “don’t know” to statements of opinion under conditions of inadequate 
information. Accordingly, dogmatism was interpreted as a defense mechanism 
which interferes with processing of predecisional information. 


The psychological processes involved in 
individual decision making are both complex 
and covert. To simplify and objectivity the 
investigation of this behavior, various com- 
ponents of the decision process have been 
abstracted and analyzed in isolation. One 
such component of particular interest is pre- 
decisional search behavior—that is, the ac- 
tivities of seeking, acquiring, and processing 
information from the environment before a 
decision is made (see Bruner, Goodnow, & 
Austin, 1956; March & Simon, 1959). 

Early experimental studies of information 
search were largely concerned with decision 
delay. For example, the work of Festinger 
(1942) precisely delineated the positive rela- 
tionship between difficulty of judgment and 
judgment time. Later studies, which utilized 
tasks in which discrete decision delays were 
associated with the acquisition of specific 
pieces of information, found information 
search to be related to the complexity of the 
problem (or the subject’s uncertainty) and 
to the cost of information (Irwin & Smith, 
1956, 1957; Lanzetta & Kanareff, 1962). 
These and other studies (Becker, 1958; 
Messick & Hills, 1960; Wolff, 1955) found 
that individuals differ reliably in their pro- 
clivities for predecisional search. Thus, more 


1 This study is based upon a dissertation pre- 
sented by the senior author in partial fulfillment of 
the requirements for the doctor’s degree at the Uni- 
versity of Delaware, 1963. The study was supported 
by Grant Number AF-AFOSR-62-95, United States 
Air Force, Air Force Office of Scientific Research, 
Office of Aerospace Research, Washington 25, D. C. 

Thanks are extended to Berj Harootunian and 
Howard Lamb for their cooperation in arranging 
the group testing sessions, 
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extensive search has been associated with 
paranoia (Binder, 1958) and with lack of 
confidence (Pruitt, 1957). 

In the present study, limited or inadequate 
predecisional search was presumed to be 
a manifestation of dogmatism. Rokeach’s 
(1960) extensive analysis of dogmatism in- 
cludes descriptions of rigid categorization be- 


havior, differentiation of categories of infor- 


mation (deriving from a leveling tendency 
and a dissociation of disparate details) and 
absolute rather than tentative judgments 
in support of authority figures. These in- 
formation processing and coding behaviors 
devolve from defense mechanisms which per- 
mit the dogmatic individual to maintain his 


belief-disbelief system intact. In this sense, — 


then, the dogmatic person is closed to new 
information; his convictions are inviolable, 
thus permitting his cognitive structure to re- 
main momentarily secure. Accordingly, a neg- 
ative relationship was hypothesized between 
dogmatism and _ predecisional information 
search. 


MeEtTHOopD 


Rokeach’s Dogmatism Scale and four decision 
measures (described below) of tendencies to reserve 
judgment were administered to 72 freshmen women 
at the University of Delaware. Verbal SCAT scores 
were also available for these subjects (Ss). 

Dogmatism Scale. The Dogmatism Scale consists 
of 40 statements (e.g., Man on his own is a helpless 
and miserable creature) with which Ss either agree 


or disagree. Scores consist of the number of items — 
agreed with, and have been found to be positively ~ 


related to scores on the F Scale, the E Scale, 
Rokeach’s Opinionation Scale, and Welsh’s A scale. 


Open-minded Ss (low scorers) were also better able ~ 
to accept and synthesize new beliefs in a problem- — 


solving situation. Rokeach has reported reliability — 


a 
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oefficients that range between .68 and .93 (Rokeach, 
960). 


Decision Tasks 


Word Completion Tasks. In the four word prob- 
ms, Ss were initially presented with the first letter 
f a word, and were required to identify the word. 
‘he Ss could delay decision as often as desired. 
Vith each delay, the S was presented with an addi- 
onal letter of the word, which cost one point. 
‘en points were awarded for a correct decision, 
ut the S was not informed whether or not his 
ecision was correct. This system of points was 
tilized in order to prevent a generalized set for 
btaining maximum information. Scores in this task 
msisted of the number of decision delays. 
Concept Information Task. The format of the 
x concept tasks was similar to that of the word 
sks. The Ss were required to decide upon a con- 
‘pt consisting of one or more attributes (such as, 
red” or “a red square with a single border”), A 
»sitive exemplar of the concept was presented ini- 
ally, and each decision delay obtained an addi- 
onal exemplar which contained one bit of infor- 
ation (reduced possible solutions by one-half) 
id which cost one point. Ten points were awarded 
ir a correct decision. The expected value (the 
‘oduct of the probability of correctness and the 
tt gain in points) was greater at each successive 
‘cision point. Thus, one rational strategy would 
msist of making the maximum number of delays 
vhen all information needed for a correct solution 
ould have been acquired). Scores consisted of the 
smber of decision delays. 
Line Judgment Tasks. In each of the two line- 
dgment tasks a poster with two lines of nearly 
ual length was displayed. The Ss were required 
decide which of the two lines was longer. Scores 
nsisted of the time taken to decide. No points 
sre awarded in this task. 
Withholding Opinion Scale (WO scale). The 
9 Scale consists of 38 statements of opinion with 
basis in fact, such as “Man will be on the moon 
1967,” or “There is life on other planets.” The 
-™may agree, disagree, or respond “don’t know” 
each item. Scores consist of the number of “don’t 
Ow” responses. Previous research (Ziller & Long, 
‘press; Ziller, Shear & De Cencio, 1964) indicated 
it this response set is negatively related to occu- 
ional status. In addition, a curvilinear relation- 
‘p was found between the “don’t know” response 
‘1 age of adults and educational level of college 
‘dents. One interpretation of these results is that 
“don’t know” response under conditions of 
‘dequate information is a status defense mecha- 
‘m related to dogmatism. For example, clinical 
‘chologists in comparison to trainees selected the 
‘n't know” response more frequently when in- 
ious comparisons were possible. 


RESULTS 


(ntercorrelations between the five measures 
realed a significant negative relationship 


Sit 


TABLE 1 


INTERCORRELATIONS BETWEEN THE DOGMATISM SCALE, 
THE FOUR DECISION MEASURES, AND THE VERBAL 
PART OF THE SCAT 











CNa—a72) 
Concept Word Lines WO SCAT 
Dogmatism —,24* —,28** —.20* —.32** —.08 
Concept (.80) SSH .28** 1.6 «28% 
Word (.65) aS .26* mS 
Lines (.57) .19* .03 
WoO (.67) .00 





8 Self-correlations are split-half reliability coefficients cor- 
rected for length. 


* p <.05 one-tailed. 
+ > <.01. 
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between the Dogmatism Scale and each of 
the four decision measures (r’s ranged be- 
tween —.20 and —.32, see Table 1). The 
four decision tasks were all positively related 
to each other (7’s ranged between .15 and 
.53), with four of the six relationships statis- 
tically significant. All decision scores, with 
the exception of those from the concept task, 
as well as dogmatism scores, were found to 
be independent of verbal SCAT scores. A 
significant positive relationship was found 
between the SCAT scores and the concept 
scores. 


Discussion 


The negative relationship between Rok- 
each’s Dogmatism Scale and the four deci- 
sion measures supports the initial hypothesis 
and indicates that in decision-making situa- 
tions the nondogmatic person tends to delay 
decision or reserve judgment, and to search 
for and utilize additional information. Iden- 
tification of these behavioral correlates of 
the Dogmatism Scale elucidates Rokeach’s 
term “closed-minded” and brings new mean- 
ing to the concept of dogmatism. By limiting 
the intake of information, the dogmatic indi- 
vidual maintains his conceptual system. 

Rokeach (1960) has found a positive cor- 
relation between dogmatism and anxiety. It 
is possible, therefore, that limited predeci- 
sional information search is a defense mecha- 
nism. When the mind is closed to new infor- 
mation, self-directed demands for changes in 
personal behavior are eliminated, and (per- 
haps more significantly) changes or reevalua- 
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tions of the self-concept may be avoided. 
Thus, it is proposed that the dogmatic indi- 
vidual defends an insecure  self-structure 
by the expedient of restricting information 
input—that is, by controlling the source of 
data relevant to his self and social conceptual 
structures. 

The negative relationship between the Dog- 
matism Scale and the WO Scale (r = —.32)? 
suggests that, under conditions of extreme 
paucity of information, the dogmatic person 
tends to express an opinion, whereas the non- 
dogmatic person elects the “don’t know” re- 
sponse. The studies of Peabody (1961) and 
Litchenstein, Quinn, and Hover (1961) indi- 
cate that the Dogmatism Scale may be some- 
what weakened by the effects of response 
style, and that scores may represent an inter- 
action between the content of the items and 
response sets. By the elimination of all con- 
sideration of item contents, the WO Scale 
appears to be a more direct, yet less visible, 
measure of certain aspects of dogmatism. 

In problem-solving terms, the “don’t know” 
response may be interpreted as a recognition 
of the existence of a problem—that informa- 
tion at hand is inadequate for a rational 
judgment. The “don’t know” response thus 
appears to be a necessary precedent condition 
for predecisional information search. Dog- 
matic individuals, then, fail to perceive, or 
ignore, the inadequacy of the environmental 
evidence. In situations requiring quick deci- 
sions, or where established facts are adequate, 
the premature closure of the dogmatic person 
may be instrumental. On the other hand, 
under conditions of environmental change, or 
conditions requiring creative responses, exist- 
ing information may be insufficient. Here, 
doubt, followed by a search for additional 


2In two current projects significant negative re- 
lationships were found between the Dogmatism 
Scale and the WO Scale, r=—.47, n=44 (p= 
01); r==—.27, 2» =52, (p= 05). 
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information, which is characteristic of the 
nondogmatic person, may prove advantageous. 
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THE RELATIONSHIP OF TASK SUCCESS TO TASK 
LIKING AND SATISFACTION? 


EDWIN A. LOCKE 


Cornell University 2 


4 laboratory experiments are reported which examine the relationship between 
degree of task success and degree of liking for and satisfaction with the task. 
A number of different tasks, measures, and situations were used. In all cases 
there was clear evidence for a significant (positive) linear relationship between 
success and measures of liking and satisfaction. The major reasons given for 
liking a task involved attributes of the individual’s performance (e.g., improve- 
ment) ; reasons given for not liking a task most often involved attributes other 
than individual performance (e.g., the monotony of the task). 


Although the major focus of research on 
9b satisfaction in the past decade has been 
n the relationship between various ‘“extrin- 
ic” aspects of the job (especially supervision 
nd human relations, e.g., Likert, 1961) and 
»b satisfaction, there has been some recent 
iterest in the effects of various work or task 
ariables on satisfaction. Symptomatic of this 
‘end was the publication by Herzberg, Maus- 
er, and Snyderman of The Motivation to 
7ork in 1959. They found that for engineers 
ad accountants the major (reported) de- 
rminant of liking for the job was the feel- 
ig of achievement in one’s work. Achieve- 
ent was defined essentially as success at 
me aspect of the work. More recently My- 
's (1964) replicated these findings with 
nployees on five different kinds of jobs (in- 
uding manual). 

Most of the other information on this sub- 
ct comes from occasional laboratory experi- 
ents. For instance Mace (1935) in a labora- 
ry experiment found that subjects (Ss) 
ao liked an arithmetic task the least did 
ore poorly on it than those who liked it the 
ost. Unfortunately Mace was not able to 
‘termine whether satisfaction was a cause 

an effect of good performance. Gebhard 

948) in another laboratory experiment 
und that Ss increased their liking for tasks 
_ which they had experienced success or on 
lich they expected to experience success in 
e future, 

In spite of previous work, there remain a 


-The first three experiments reported here consti- 
ed part of the author’s doctoral dissertation. 
‘Now at the American Institutes for Research, 
ishington office. 


number of unanswered questions regarding 
the relationship between task success and 
task satisfaction. For instance the cause-effect 
sequences given by Herzberg et al. (1959) 
and Myers (1964) were reported cause-effect 
sequences. There was no actual manipulation 
of success and failure nor any accompanying 
before-after attitude measurement. The find- 
ings of Mace (1935) were also inconclusive; 
it is not even clear from his report just when 
the attitudes toward the task were measured. 
The data of Gebhard (1948) are more un- 
equivocal with respect to cause and effect, 
but with the single weakness that she used a 
factorial (only two values of the independent 
variable) rather than a functional (concom- 
mitant variation) approach (Townsend, 1953, 
p. 83), thus preventing the determination of 
the precise shape of the relationship. 

The present paper will report a series of 
laboratory studies designed to clarify some of 
the above problems. The purpose was to 
explore more fully the relationship between 
a single task (or task experience) variable: 
degree of success, and liking for and satisfac- 
tion with the task. The major interest was in 
the shape of the relationship. Of lesser in- 
terest were the reasons given for liking or 
disliking the task. 


EXPERIMENT [| 


Method. The task in this experiment was word 
unscrambling. Each of 85 Ss (from the introductory 
industrial psychology S pool) was given one set of 25 
five-letter words, one set of 25 six-letter words, and 
one set of 25 seven-letter words. The sets were given 
in random order to different Ss. On each set Ss were 
given 20 minutes to solve as many of the 25 words 
as possible. 


379 





380 
o 
c 
= 
2 00-.20 .2I--40 41-60 .61-.80.8I-1.00 
Proportion of Words Solved (Proportion Success) 
Fic. 1. Liking as a function of success (Experi- 


ment I). 


After completing the three sets of words all Ss 
were given a 9-point rating scale for liking with a 
forced anchor format: the “least liked word length” 
anchored one end, the “most liked word length” 
the other end, and the remaining word length was 
rated somewhere in between. 


Results. Three sets of scores were obtained 
for each individual, one liking score and a 
corresponding proportion of success score for 
each of the three word lengths. Figure 1 
shows the mean liking scores as a function 
of the proportion of success category for all 
Ss and word lengths combined. It is clear 
that the trend is a linear one, although the 
nonindependence of the scores precluded an 
actual trend analysis. However, the signifi- 
cance of the linear trend is supported by a 
product-moment correlation of .49 (p< 
.01) ;* between success and liking. 

Discussion. This first brief experiment sug- 
gested a clear linear relationship between de- 
gree of success and liking for the task. The 
next experiment was designed to test the 
generality of this finding across tasks and 
measures. 


EXPERIMENT II 


Method. The task in this experiment involved 
listing objects or things that could be described by 
a given adjective (e.g., “heavy”). There were 15 
trials and Ss were given a different adjective on 


3 The N for observations was 255, for subjects 85. 
The significance test was based on an WN of 85. 
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each trial and told to list things or objects that 
could be described by the adjective for 1 minute. 
The E told the Ss how to score their own protocols. 
Generally any answer was acceptable that did not 
repeat words in the same category (e.g., for “hot,” 
“coffee,” “tea,” and “soup” would all be considered 
“beverages”’). 

The Ss (paid summer school volunteers) were 
divided at random into three different groups. Each 
group had a different “standard of success” to beat 
on each trial. In the “Easy” group (V=26) the 
standard of success was 4 things or objects in 1 
minute. Thus to be “successful” Ss had to give at 
least 5 things or objects. In the “Medium” group 
(N = 22) the standard of success was 9 objects. In 
the “Hard” group (N = 23) the standard of success 
was 14 objects. A successful trial was defined as 
one on which an S beat his standard. The Ss in all 
groups were told that the standards were “what E 
considered to be a successful performance on the 
basis of his experience with the task and represented 
slightly above the average performance.” 

After the last trial Ss counted up the number of 
trials on which they had been successful, that is, 
beaten their standard. Then they filled out two 
questionnaires. The first was a 7-point task rating 
scale anchored by the phrases “liked it very much” 
and “strongly disliked it.” The second was a slightly 
modified version of the Cornell Job Descriptive In- 
dex (JDI), Work Scale, a measure of work satis- 
faction. The JDI (work scale) is a list of 18 adjec- 
tives descriptive of work. The S indicates the ap- 
propriateness of each adjective to his own work by 
placing “Y” (for “Yes’), “N” (for “No” orieers 
(for ‘Cannot decide’) next to each. The scale 
characteristics of the JDI and the method of scor- 
ing are described by Locke, Smith, Hulin, and 
Kendall (1962). Validation studies are reported by 
Kendall, Smith, Hulin, and Locke (1962). The ver- 
sion used here simply dropped four of the items 
from the original scale (see Locke, 1964, p. 106 for 
the version used here). 


Results. The 7-point liking scale and the 
JDI work scale correlated .72 (p < .01, for 
all Ss), but the two measures were analyzed 
separately since it was possible that they 
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Fic. 2. Liking and satisfaction as a function of suc- 
cess (Experiment II). 


LIKING AND SATISFACTION IN Task Success 


would be related differently to the different 
experimental treatments. The mean liking 
and JDI work satisfaction scores for each 
experimental group are shown in Figure 2 as 
a function of proportion of successes (i.e., 
proportion of trials on which Ss beat their 
standard). It is evident that the curves are 
not completely linear due to the very similar 
means for the Medium and Hard groups. The 
results of trend analyses and ¢ tests on these 
data are shown for each measure separately 
in Tables 1 and 2, Equal intervals between 
conditions were assumed for the trend analy- 
Bes.” 

For both measures there is a significant 
linear trend, but for the liking measure there 
is also a significant quadratic trend though 
of smaller magnitude than the linear trend. 


TABLE 1 


TREND ANALYSIS AND ¢ TEST RESULTS FOR 
Lixinc ScALe: Experment II 

















Source SD: df F p 
Between 35.91 2 — — 
Linear 23.71 1 9.17 01 
Quadratic 12.20 1 4.71 05 
Within 176.26 68 — 

Comparison df t p 
Easy-Hard 47 Sate .O1 
Easy-Medium 46 3.48 01 
Medium-Hard 43 <<) ns 





Since the JDI work satisfaction measure has 
shown consistent convergent and discriminant 
validity (Kendall et al., 1962), we can put 
somewhat more confidence in these results 
than those for the liking scale. 

Tables 1 and 2 show clearly that the sig- 
nificant mean differences are between the 
Easy group and the other two for both meas- 
ures, 

Table 3 shows the product-moment corre- 
lations between degree of success and liking 
for the task within each group and for all Ss. 


4The assumption of equal intervals is supported 
both by the equal differences in the level of the 
standards: 4, 9, and 14, and by the nearly equal 
differences in the degrees of success attained by the 
three groups: .93, .55, and .13 for the Easy, Medium 
and Hard groups, respectively. 
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TABLE 2 
TREND ANALYSIS AND / TEST 
Resutts For JDI: Experment II 

Source SS df F p 

Between 771.14 z = 
Linear 636.80 1 10.73 .O1 
Quadratic 134.34 1 2.26 ns 

Within 4,036.58 68 — 
Comparison df t p 
Easy-Hard 47 S205 O01 
Easy-Medium 46 3.02 01 
Medium-Hard 43 <1 ns 





The data for the whole group again support 
the significant linear trends found with the 
group means. The higher correlation in the 
Medium group as compared to the other two 
can be attributed to the greater variance in 
the success scores in this group as compared 
to the others. 

Discussion, Again the data clearly support 
a linear function relating degree of task suc- 
cess and task liking and satisfaction. The 
additional quadratic trend found with the 
liking scale is suggestive but not conclusive 
as the latter has not been validated. 

The next experiment was designed, in part, 
to test the generality of these findings over 
a different though similar task, and under 
slightly different experimental conditions. 


EXPERIMENT III 


Method. The task in this experiment involved 
giving uses for objects (e.g., an ash tray). There 
were 20 trials and the Ss were given a different 
object on each trial and told to list possible uses 
for each object for 1 minute. Again E told the Ss 


TABLE 3 


CORRELATIONS OF LIKING AND JDI 
SCALES wiTH Success: EXPERIMENT II 








Correlation of success with 





Liking JDI (work 
Group scale scale) N 
Easy —.33 —.12 26 
Medium vee oie 22 
Hard 07 18 23 
All subjects Aleta 43** iA 





ep < 01. 
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Satisfaction (JD) 
Liking Scale 


Progressive 





(00. 20 40 60 80 100 
Proportion of Success 
Fic. 3. Liking and satisfaction as a function of suc- 
cess (Experiment III). 


how to score their own protocols. The same rules 
applied as in the previous task. 

The Ss (from the introductory psychology S pool) 
were divided at random into four different groups. 
As in the previous experiment each group had a 
different standard of success to beat on each trial. 
Two conditions were identical to conditions in the 
previous experiment. The Easy group (NV =27) had 
to beat a standard of 4, and the Hard group (VN = 
29) a standard of 14 uses on each trial. In a third 
group, however, the “Self-Set” group (V=27), Ss 
were allowed to set their own standards on each 
trial, but were told to try and give as many uses as 
possible regardless of where they set the standards. 
The fourth group, the “Progressive” group (NV = 29), 
was given a different standard on each trial. The 
standard at first was the same as that for the Easy 
group (ie., 4) and gradually increased until on the 
last trial it was slightly harder than that for the 
Hard group (ie. 15). The Ss in all groups (ex- 
cluding the Self-Set group) were again told “the 
standards were what E considered to be a success- 
ful performance and were slightly above the average 
performance.” 

After the last trial the Ss counted up the number 
of trials on which they had been successful in beat- 


TABLE 4 


TREND ANALYSIS AND ¢ TEST RESULTS 
FOR LIKING SCALE: EXPERIMENT III 

















Source 5) df F p 
Between 19.24. 2 — 
Linear 15.82 1 7.99 01 
Quadratic 3.42 1 (ei ns 
Within 216.18 109 — 
Comparison af t p 
Easy versus Hard, Progressive 83 2.59 05 
Self-Set versus Hard, Progressive 83 2.44 .05 
Easy versus Self-Set 52 0) ns 
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ing the standard. Then they filled out two ques- 
tionnaires similar to those used in the previous ex- 
periment, a 7-point liking scale and a slightly modi- 
fied version of the JDI work satisfaction scale. (See 
Locke, 1964, p. 110 for the actual scale used.) 


Results. Again the results for each measure 
were analyzed separately. (The correlation 
between them was .68, p < .01, for all Ss). 
The mean liking scores by group for each 
measure are shown in Figure 3. 

In order to reduce the number of ¢ tests 
and in order to make a trend analysis with 
equal intervals possible, the Hard and Pro- 
gressive group Ss were combined into a single 
group. The mean proportion of successes for 
the combined Hard-Progressive group was 
.12, for the Self-Set group .53, and for the 
Easy group .91, very close to equal intervals. 


TABLE 


TREND ANALYSIS AND ¢ TesT RESULTS For JDI: 
EXPERIMENT IIT 

















Source SS) df F p 
Between 1,521.30 2 — 
Linear 1,331.20 1 16.64 O01 
Quadratic 190.10 1 2.38 ns 
Within 8,724.7 109 == 
Comparison df t p 
Easy versus Hard, Progressive 83 3.69 001 
Self-Set versus Hard, Progressive 83 3:22 FO 
Easy versus Self-Set 52 <1 ns 





The results of the trend analyses and ¢ tests 
are shown in Tables 4 and 5. 

The results for both measures are very 
similar. There are significant linear trends 
and no significant quadratic trends. The sig- 
nificant mean differences for both measures 
are between the Hard-Progressive (combined) 
group and the other two. The only difference 
between the two attitude measures is that 
the results for the JDI are more highly sig- 
nificant. This is probably due to the greater 
reliability and validity of the JDI. 

Table 6 shows the within group and over- 
all correlations between the liking and JDI 
measures and degree of success. Again the 
correlations support the significance of the 
linear trends found with the group means. 

Discussion. These results agree quite weli 
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with those of the previous experiments in that 
the linear relationship between degree of suc- 
cess and satisfaction was replicated. 

There is an important point of difference 
between this and the previous experiment, 
however. In the previous study, the major 
increase in liking took place as the propor- 
lion of success rose from .50 to over .90 
(from the Medium to the Easy condition). 
in this experiment, in contrast, the major 
ncrease in liking occurred as the probability 
of success rose from below .20 to around 
SO (from the Hard-Progressive to the Self- 
set condition). The major difference then is 
hat the Self-Set group in this experiment 
was relatively more satisfied with the task 
han the Medium group in the previous ex- 
yeriment even though the degrees of success 
vere almost identical for both groups. 

The difference is probably explained by the 
act that a moderate degree of success (i.e., 
50) for the Medium group indicated a 
‘slightly above average performance” (ac- 
ording to the information given the Ss by 
he E), whereas the Self-Set group was given 
10 standards and thus the meaning of a “mod- 
rate” degree of success was not known for 
his group. It probably represented substan- 
ially more than an “average performance” 
o them. 

The final experiment to be reported here 
vas intended to establish the generality of 
he major finding on a learning task, as con- 
rasted with the performance tasks used in 
he first three experiments. 


EXPERIMENT IV 


Method. The task was a standard pursuit-rotor 
sk. The Ss followed a 4-inch wide slit rotating at 
9 revolutions per minute atop a round fluorescent 
ght with a photosensitive metal stylus. The S’s 
me on target was recorded by a timer connected 
) the stylus. All Ss had 20 trials of 90 seconds each 
ith a 1-minute rest between trials. 
In the conditions of relevance to this topic, 26 
iale Ss (from the introductory psychology S pool) 
ere divided at random into three groups. In Group 
(N= 9) Ss were told to try and beat a constant 
andard of either 20 or 40 seconds on target on 
ich 90-second trial. The Ss were given only visual 
edback. In Group B (V=9) Ss were told to try 
ad beat a “moving” standard. Generally the 
andard was increased each trial 1 second above 
ie time on target score obtained on the previous 
ial. In Group C (N=8) Ss were again given a 
mstant standard of 20 or 40 seconds on target to 
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TABLE 6 
CORRELATIONS OF LIKING AND JDI 
SCALES wiTtH Success: EXPERIMENT III 
Correlations of success with 
Liking JDI (work 
Group scale scale) N 
Easy A7* £35 27 
Self-Set ao 30 Dif 
Progressive 27 —.06 29 
Hard A8** BO 29 
All subjects oe Guiles 112 
*p <.05 
kp <.01 


beat but in this group (as well as Group B) they 
were given auditory (by means of a buzzer that 
went on when the S was on target) as well as 
visual feedback. 

At the end of the last trial Ss were asked to fill 
out a 7-point liking scale for the task similar to 
that used in the previous experiments. (The JDI 
work scale was not administered.) The Ss were 
then asked in a short interview to indicate why 
they liked or disliked the task. 


Results. There were no significant differ- 
ences in the means of the three groups (F 
test), but the overall correlation between 
degree of success and liking was .42 which is 
significant at the .05 level. 

The reasons for liking the task involved an 
N of 34. Eight Ss in a fourth group which 
did not have any standards of success (and 
therefore no degree of success scores) were 
included. In addition, if the S gave more than 
one reason for liking the task both were in- 
cluded, so the total N for “reasons” was some- 
what greater than 34. Of the 15 Ss who 
“liked” the task (i.e., responded to the “like” 
side of the neutral point on the liking scale), 
11 gave the “feeling of improvement” as a 
reason. Six mentioned the “challenge of the 
task” (which in most cases they had over- 
come by doing well), and 4 gave “novelty” 
as a reason for liking it. 

Of the 14 Ss who “disliked” the task (i.e., 
responded to the “dislike” side of the neutral 
point) 8 mentioned that the task was “tiring,” 
“fatiguing,” or “made my arm hurt.” Nine 
gave “boring,” “repetitive,” or “monotonous” 
as a reason, and only 2 gave the fact that 
they had done poorly at the task as a reason. 
Five Ss were “neutral” about the task and 
were not questioned. 
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Discussion. First it should be noted that 
the linear relationship between degree of suc- 
cess and liking for the task was once more 
replicated but this time with a learning task. 
This is particularly interesting because in 
this experiment the Ss were not told explic- 
itly that beating the standard constituted a 
“successful performance” as in the second 
and third experiments; they were simply 
told to try and beat it. It is probable that 
this made “success”? somewhat less salient 
for these Ss than for those in the previous 
two experiments. In this respect it is perhaps 
more similar to the first (word unscrambling) 
experiment. 

Of more interest here are the reasons given 
for the liking ratings. It seems clear that Ss 
who liked the task (and therefore presum- 
ably those who did well at it) were far more 
likely to attribute their liking to their own 
performance on the task than were those who 
did not like the task (and therefore those who 
did not do well at it). The latter were more 
likely to attribute their dislike to attributes 
of the task (i.e., attributes external to their 
actual performance). 

This at least suggests that Ss may tend to 
“externalize” the reasons for their failures or 
dislikes while attributing success and likes 
to their own skills and characteristics. 


CONCLUSION 


The findings of the four experiments re- 
ported here lend strong support to the gen- 
erality of a linear relationship between degree 
of task success and degree of liking for and 
satisfaction with the task. The findings were 
replicated over a number of different Ss, 
across two different measures of satisfaction, 
and over several different situations and 
tasks. 

These results would seem to support the 
earlier findings of Gebhard (1948) but using 
a functional instead of a factorial approach. 
With respect to Mace’s (1935) work, they 
suggest that task satisfaction is (among other 
things) certainly a result of task performance, 
though the degree to which it may function 
as an independent variable as well was not 
determined here. Finally the data support 
the assumptions of Herzberg, Mausner, and 
Snyderman (1959) that reported cause-effect 
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sequences were valid indicators of actual 
cause-effect sequences. 

Subsequent research on this topic might 
go in a number of different directions. For 
instance Hilgard (1958) has distinguished 
four different kinds of success appropriate to 
a “level of aspiration” situation: (@) reach- 
ing one’s goal, (b) getting close to one’s goal, 
(c) improving over one’s previous perform- 
ance, and (d) being able to set goals that one 
considers desirable. The present research dealt 
only with the first type of success.® It would 
be of interest to try and distinguish other 
types (especially 5 and c) and to relate these 
to liking for the task. It is likely that in many 
industrial learning situations, getting closer 
to “standard” is both a source of satisfaction 
and an impetus to continued effort. 

Also of interest would be the delineation of 
various subdimensions of satisfaction. The 
Cornell JDI work scale, though represent- 
ing a clear general work satisfaction factor 
(Kendall et al., 1962), contains a number of 
analytically distinguishable subdimensions. 
One is suggested by the items “good,” “‘satis- 
fying,” and “pleasant” and might be called 
an “evaluative” dimension. A second group 
with something in common are the items 
“boring,” “tiresome,” “endless,” and “rou- 
tine” and could be called a “monotony”  di- 
mension. A third group of similar items are 
“fascinating,” “stimulating,” “challenging,” 
and “creative” and could be called an 
“arousal” dimension. A fourth set of items 
with some a priori similarity include the items 
“simple,” “difficult,” and “frustrating” and 
could be called a “difficulty” dimension. Al- 
though no detailed factor analyses of the 
JDI work scale by itself have been performed 
as of yet, it would be of interest to see 
whether subdimensions similar to these would 
emerge. If so, it is possible that different 
subdimensions would be related differently to 
different types of success and to different sit- 
uations. 

Also of interest would be research on the 
long term relationship between liking and 
success. Atkinson (1958) for one has sug- 
gested that success is valued less when it is 
very easy to obtain than when it is very hard 


5 The “goals,” except for the Self-Set group in 
Experiment III, were actually goals (standards) Sef 
by the experimenter for the subjects. 
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to obtain. Although no clear curvilinear trend 
emerged here, it is possible that in longer 
term experiments, Ss who succeed very fre- 
quently will become somewhat bored and dis- 
satisfied with the task. 

Finally it would be of interest to see how 
the intentions of the Ss affect the way in 
which success and satisfaction are related. In 
all the experiments reported here the Ss (pre- 
sumably) had the intention to succeed, which 
in most cases involved beating some stand- 
ard of success (or level of aspiration) set by 
the E or at least doing as well as possible. 
The author would not predict that there is 
any automatic relationship between task 
success and task satisfaction; rather the re- 
lationship should be highest in those cases 
where Ss are trying hardest to succeed and 
lowest (or nonexistent) in situations where 
they are trying least hard to succeed. If situ- 
ations could be devised where Ss were not 
trying to succeed, the author would predict 
that the correlation between success and sat- 
isfaction would be lower than the correlation 
for the same task where the Ss were trying to 
succeed. The importance of intentions in in- 
fuencing behavior has been emphasized by 
Ryan (1958; 1964a-e) and Ryan and Smith 
(1954) but little research has been done on 
this topic to date. 

With respect to the reasons given for lik- 
ng and disliking the task given in Experi- 
nent IV, there is something of relevance to 
the work of Herzberg et al. (1959). The lat- 
fer found that job “dissatisfiers” tended to 
9 aspects of the job not directly related to 
he individual’s performance (e.g., super- 
vision, company policy) while job “satisfiers” 
ended to be outcomes of individual perform- 
ince (e.g., achievement, recognition). The 
lata here at least suggest that one reason 
vhy “achievement” experiences were men- 
joned less often as dissatisfiers than as 
atisfiers was the tendency of people to ex- 
ernalize or project failure (or dissatisfaction) 
o factors outside the self or the activity of 
he self. Just how serious a source of bias this 
vas in the Herzberg et al. results cannot be 
letermined here, but it would seem worth- 
vhile to do more extensive research on the 
natter. 

In conclusion, this research supports the 
iotion that task success is an important 
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source of affective attitudes toward the task 
or work. In view of this, further research 
in both the field and the laboratory on the ef- 
fects of factors in the work itself on job atti- 
tudes should prove fruitful. 
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interest inventory 


to help those who are not college trained 
find their way among the jobs open to them 
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Minnesota Vocational Interest Inventory 
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Based on extensive research with Navy enlisted men 
and civilian groups over a twenty-year period. 


For counseling: 


e High school students who do not 
plan to attend college 


e Vocational school students 

e Post-high-school technical students 
e Apprentice candidates 

e School dropouts 

e Technologically displaced workers 


e Unemployed adults—especially 
those who need retraining 


Compares the interests of boys and men with the responses 
of employed workers in 21 skilled and unskilled occupations: 


baker warehouseman truck driver 

food service manager hospital attendant truck mechanic 

milk wagon driver pressman sheet metal worker 

retail sales clerk carpenter plumber 

stock clerk painter machinist 

printer plasterer electrician 

tabulating machine industrial education radio-ty repairman 
operator teacher 


and also provides scores for 9 interest areas 


Specimen set of the MVII free to guidance, rehabilitation, and 
training program administrators who write on official letterhead. 
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® 308A East 45th Street, New York, N.Y. 10017 








a - The authors examine the effect of job 
hours on the worker's life as husband, 
7 father, relative, and friend. They com- 


pare the adjustment patterns for after- 


The Social Psycholouical noon, night, and rotation shift workers 
’ ’ 


to those of day workers. They also dis- 


and Physical Consequences cuss the effect a shift worker’s hours 


have on his estimate of himself and his 








By Paul E. Mott, Floyd C. Mann, view of the world. and having analyzed 
Quin McLoughlin, and the negative consequences of shift 
Donald P. Warwick work, they suggest possible remedies. 
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TEAM VERSUS INDIVIDUAL TRAINING, TRAINING 
TASK FIDELITY, AND TASK ORGANIZATION 
EFFECTS ON TRANSFER PERFORMANCE 
BY THREE-MAN TEAMS* 


GEORGE E. BRIGGS anp JAMES C. NAYLOR 


Ohio State University 


Transfer performance of teams was measured in a simulated radar-controlled 
aerial intercept task. Superior performance occurred after training on an 
independently organized task (as compared to that after training which re- 
quired verbal interaction among controllers), and superior performance oc- 
curred in an independently organized transfer task. However, these 2 variables 
interacted such that performance on an interaction condition of the transfer 
task was equivalent to that on an independently organized task if prior training 
was under the independent task organization. Training task fidelity influenced 
performance only on the interaction transfer task, with superior performance 
following a high-fidelity training situation in which controllers could acquire 
the same skills to be required in transfer for communication to interceptor pilots. 


In a previous article Naylor and Briggs 
1965) showed that team performance in a 
adar-controlled aerial intercept task was 
nfluenced by the way the task itself was 
rganized: superior performance was obtained 
vith teams in which the members worked 
ndependently of one another compared to 
yerformance in an organization which en- 
ouraged interaction (verbal communications, 
rading targets, and interceptors, etc.) be- 
ween radar controllers (RCs). The present 
tudy represents an extension of the task- 
rganization variable as well as an evaluation 
f the importance of training-task fidelity on 
eam performance during transfer. 

As in the previous research, the present 
ransfer task involved a three-man team per- 
orming aerial-intercept control via simulated 
adar displays. Experimenter assistants por- 


1 This research was supported by the United 
tates Navy under Contract No. N61339-1327, 
sonsored by the United States Naval Training Device 
‘enter, Port Washington, New York. Permission is 
ranted for reproduction, translation, publication, 
se, and disposal in whole or in part for any purpose 
f the United States Government. 
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trayed interceptor pilots and made heading 
and speed adjustments as directed to do so 
by the RCs over simulated radio communica- 
tions channels. Also as in the previous study, 
the training task was an abstraction of the 
transfer task: a checkerboard replica of the 
radar coverage with both interceptor and tar- 
get aircraft represented by checkers. During 
training in half the teams the RCs moved 
their own interceptor checkers, while in the 
other half of the teams the RCs directed such 
moves over a simulated radio link to experi- 
menter assistants. Thus, under the latter 
condition the RCs could acquire the same 
communications skills as would be required 
in the transfer task, and this will be identi- 
fied as the high-fidelity training condition, 
while in the former condition the RCs were 
given no opportunity to practice and develop 
such communication skills since they served 
as their own pilots. This will be called the 
low-fidelity version of training. Note that the 
communications aspect defined the primary 
difference between these two training condi- 
tions as the basic task remained constant, and 
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TABLE 1 


DrsIGN oF ‘THe ExeeRIMENT 


Transfer Training Training 








task organ- task organ- task 
Group ization ization fidelity 

1 Independent Independent High 
2 Independent Independent Low 
3 Independent Interaction High 
4 Independent Interaction Low 
3 Interaction Independent High 
6 Interaction Independent Low} 
7 Interaction Interaction High 
8 Interaction Interaction Low 


so all RCs, regardless of whether they trained 
in the high- or the low-fidelity condition, 
could acquire the necessary perceptual and 
tactical skills required to perform successful 
interceptions of target aircraft. 

In the earlier Naylor and Briggs study, a 
team experienced the same task organization 
(independent or interaction) throughout both 
training and transfer. In the present study 
this variable was extended such that half the 
teams experienced the same _ organization 
throughout, while the other half either trained 
under an independent configuration and 
transferred to the interaction condition, or 
vice versa, Thus, we can consider task organi- 
zation as two variables in the present study: 
as a training variable (level of organization 
during training) and as a system variable 
(level of organization during transfer). 

An interesting consequence of the above 
factorial arrangement of task organization as 
two independent variables is that as a train- 
ing variable it permits an evaluation of indi- 
vidual versus team training: if during train- 
ing the RCs could not communicate directly 
with one another, as in the independent ver- 
sion of task organization, they could be 
considered to be experiencing a form of indi- 
vidual training, whereas in the interaction 
condition they did work directly with one 
another and thus can be considered as having 
been trained as a team. 


MertHOpD 


Subjects and design. There were eight groups of 
teams and each group consisted of seven three-man 
teams. Each team served for nine 35-minute ses- 
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sions over 2 weeks and each man received $1.00 
per session, 

The experimental design is summarized in Table 1. 
The variables listed refer to the operating conditions 
for the two RCs who were required to issue heading 
and speed commands for the interceptor aircraft. 
The third team member served as a supervisor of 
the two RCs and was required to see that the 
operation was well run and that hand-offs of 
targets (when necessary) were carried out effectively ; 
he will be designated as the SC. Under independent 
task organization conditions the RCs were separated 
by a barrier and one could not observe the other 
or the airspace being controlled by the other RC. 
Further, no verbal communication was possible be- 
tween RCs as an RC could communicate only with 
the SC or with an interceptor pilot. Thus, any hand- 
off of targets was effected by the SC. Under inter- 
action conditions, the barrier was removed, the RCs 
could observe the entire airspace being controlled, 
they could communicate directly with one another, 
and they handed off targets directly with one another 
—not through the SC. These definitions of task 
organizations held for both the training and the 
transfer tasks. 

As explained earlier, fidelity refers to the com- 
munications procedures required in the training 
task. Under high-fidelity conditions, experimenter 
assistants manipulated the interceptor checkers while 
the RCs observed the checkerboard from above and 
issued heading and speed commands over the same 
type of intercommunication system as in the trans- 
fer task. Under low-fidelity training conditions, the 
RCs moved the checkers themselves and thus had 
no opportunity to acquire procedural skills neces- 
sary for communication with the pilots. Due to 
the relatively heavy schedule of incoming targets 
(a constant load of four targets per RC), com- 
munications from RC to pilots were quite heavy and 
the RCs had to learn to issue commands efficiently. 
To do so, it was necessary to adopt sufficiently 
abbreviated verbal expressions and deliver them 
rapidly but clearly. Thus, it was expected that the 
high-fidelity training would exert a beneficial effect 
on team transfer performance. 

Apparatus and procedure. The same apparatus 
served here as in the Naylor and Briggs (1965) 
study. Briefly, the training task consisted of a 
checkerboard with eight target and eight interceptor 
checkers, while the transfer task was defined by the 
OSU Electronic Air Traffic Control Simulator 
(Hixson, Harter, Warren, & Cowan, 1954). The 
latter provided an 11-inch CRT for each team mem- 
ber plus one for an experimenter assistant on which 
appeared eight target and eight interceptor radar 
returns. These blips moved in real time and turned 
at 3 degrees per second as “flown” by pilots (experi- 
menter assistants) from target generator consoles. 
The RCs, the SC, and the pilots communicated over 
a three-channel voice communication system. 

Under all experimental conditions target aircraft 
entered the radar coverage from either the west 


or east and proceeded on one of 66 straight-line 


ai 
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ourses across the displayed area. Running from 
orth to south was a 1.7-inch-wide transition zone: 
ny target penetrating this zone had to be transferred 
rom the RC originally assigned the target to the 
ther RC. Thus, if a target entered from the west 
left side of the CRT display), the “west” RC 
ttempted to accomplish an intercept (defined by 
lacing an interceptor aircraft within 2 nautical miles 
f the target). Failing this, the target would proceed 
oward the east and enter the transition zone. Under 
he interaction condition, the west RC would call 
his target to the attention of the “east”? RC who 
vould then prepare to take over responsibility; 
nder the independent condition the SC would so 
lert the east RC. Targets moved at either 600 or 
00 knots while interceptors were capable of speeds 
etween 200 and 1200 knots. All aircraft operated 
t 35,000 feet, the 2-D condition of the original 
Jaylor and Briggs (1965) study. 

The measures of team performance during trans- 
sr were the number of successful intercepts (hits) 
nd pounds of fuel consumed by all interceptors per 
5-minute session. A composite score was derived 
rom these two: amount of fuel per hit indicating 
he efficiency of team operations. 

Finally, the individual RCs called out the point 
then they believed a successful intercept had been 
btained. An experimenter assistant at the fourth 
‘RT then “froze” the two aircraft, measured the 
sparation to see if indeed a hit had occurred, 
rbited the interceptor, and issued instructions to 
1e pilot (another experimenter assistant) to re- 
nter that target on its next preprogrammed penetra- 
on from the west or east. In this way a near- 
onstant load of four targets per RC was maintained. 


RESULTS 


Table 2 provides a summary of an analysis 
f variance applied to the efficiency data 
fuel consumed per successful interception) 
or all four transfer sessions. It may be noted 
hat only transfer task organization was sta- 
istically significant (p < .05) as a main ef- 
ect, although training task organization did 
iteract significantly with sessions (p < .01). 
n the former, the teams which experienced 
he independent condition of transfer task 
rganization performed at a superior level 
9 those teams in which the RCs interacted 
irectly with one another (an average of 
279 versus 11,036 pounds of fuel per hit 
ver all four sessions). In the latter, those 
sams trained in the independent version of 
he checkerboard task were superior to those 
‘ained under interaction conditions for the 
rst two transfer sessions (by 4,064 and 
473 pounds of fuel per hit on Sessions 1 
nd 2, respectively); however, this superior- 
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ity was lost during the third transfer session 
and the two sets of teams were comparable 
thereafter. 

Thus, an independently organized task 
emerges as more desirable than an organiza- 
tion requiring direct verbal interaction be- 
tween RCs both for training and for “actual” 
(transfer) task performance. However, one 
cannot conclude from this that performance 
in an organization requiring interaction nec- 
essarily will be inferior to that in an inde- 
pendent organization task: the statistically 
significant interaction of Training x Transfer 
Task Organization (Table 2) is shown in 
Table 3, where it is apparent that perform- 
ance during Sessions 1 and 2 on an inter- 
action version of the transfer task following 
training on an independently organized task 
(10,067 pounds per hit) matches performance 
on the independent transfer task itself 
(10,575 and 9,761 pounds per hit). Further, 
good transfer performance can be obtained 
following training on an interaction condition 
if transfer is to an independently organized 
task (9,761 pounds per hit), and the penalty 
of interaction training is manifested only if 
transfer is to an interaction condition (17,944 
pounds per hit). A Duncan multiple-com- 
parisons test (Edwards, 1960) was applied 
to the four means of Table 2, and the aver- 


TABLE 2 


ANALYSIS OF VARIANCE FOR THE F'ouR 
TRANSFER SESSIONS 











Source df MS F 
Fidelity (F) 1 197,376 — 
Training organization (TO) 1 159,302,144 1.62 
Transfer organization (TrO) 1 425,771,520 4.33* 
Fee aC) 1 36,832,768 _— 
EX TrO 1 718,848 os 
TO <XTrO 1 480,472,576 4,89* 
Boe TO OCTrO. 1 45,990,144 os 
Teams within groups (T/G) 48 98,248,117 
Sessions (S) 3 677,966,760 40,14 
SXF 5 2,682,624 —_ 
Ss ro 3 67,704,490 4.01°** 
Six TrO 3 42,287,360 2.50 
SX BX TO: 3 9,920,683 _— 
Sie Mae ro 3 79,056,554 4,68 
Sk TO: X' Tro 3 49,684,565 2.94* 
SexX<E XX TOM Tro 3 10,810,965 — 
S X T/G 144 16,887,838 

* Dp < .05 
*kD <.01 


TABLE 3 


AVERAGE PERFORMANCE OVER SESSIONS | AND 2 ONLY 
FOR THE FouR CONDITIONS OF TRAINING AND TRANS- 
FER TASK ORGANIZATION 





Training task 





organization 
Transfer 
task organ- Inde- Inter- 
ization pendent action Average 

10,575 9,761 

Independent 10,168 
(1&2) (3&4) 
10,067 17,944 

Interaction 14,005 
(5&6) (7&8) 

Average 10,321 13,852 





Note.—Group numbers in parentheses. 


age of Groups 7 and 8 differs at p < .05 from 
the other three means, none of which differ 
significantly among themselves. 

From Table 2 it may be noted that there 
was a significant (p< .05) interaction of 
Training x Transfer Task Organization x 
Sessions. This indicates that the pattern of 
the interaction found in Table 3 was present 
during Session 1 and Session 2 but was par- 
tially emasculated by Session 3, and during 
Session 4 all four cell means were almost 
comparable. Thus, the observation made 
earlier that training task organization signifi- 
cantly influenced performance during transfer 
Sessions 1 and 2 is true with regard to the 
interesting interaction of training and trans- 
fer task organization; thus, by Session 3 only 
transfer task organization continued to affect 
performance significantly (and by Session 4 
even this effect was much diminished). These 
main and interactive effects of task organiza- 
tion will be considered below. 

Also from Table 2 it is apparent that 
Sessions X Training Task Fidelity x Transfer 
Task Organization was a statistically signifi- 
cant interaction. Figure 1 illustrates the 
interaction of fidelity and task organization 
for Session 1; the four data points did not 
differ significantly on Sessions 2, 3, or 4. A 
Duncan multiple-comparisons test revealed 
that the two data points for high fidelity 
do not differ significantly, nor do the two 


GrorcEe E. Briccs AND JAMES C. NAYLOR 


data points for the independent version of the 
transfer task; however, there is a significant 
difference (p < .05) between the two data 
points for low fidelity and between the two 
data points for the interaction version of the 
transfer task, 


DISCUSSION 


From the above it may be concluded that 
of the three independent variables, training 
task fidelity exerted the most brief effect on 
transfer performance (Session 1) and that the 
effect was limited to performance on the 
interaction version of the transfer task. 
Training task organization was effective over 
a somewhat longer period (Sessions 1 and 
2), while transfer task organization continued 
to influence performance strongly through 
Session 3 and showed an initial weakening 
only in Session 4. Finally, the interactive 
pattern of training and transfer task organi- 
zation persisted over Sessions 1 and 2, weak- 
ened considerably during Session 3, and was 
no longer present by Session 4. 

Fidelity. It may be recalled that under 
high-fidelity training the RCs utilized similar 
communication equipment to that which they 
would find in the transfer task to issue com- 
mands to ‘“‘pilots” who moved the checkers; 
in the low-fidelity condition, the RCs moved 
their own checkers and thereby had no oppor- 
tunity to acquire the verbal skills which 
would be necessary in the transfer task to 
control the interceptor aircraft. Figure 1 indi- 


Transfer Task 
Organization 


o——« Independent 


O————O Interaction 


y 


System Efficienc 
(Fuel per hit) 





Training Task Fidelity 


Fic. 1. The pattern of interaction for training 
task fidelity and transfer task organization on Ses- 
sion 1. 
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ates that of the two sets of teams which 
<perienced the low-fidelity voice communica- 
ons condition, only that set which trans- 
srred to the interaction task suffered from 
ne lack of opportunity to learn communi- 
ation procedures with the pilots during 
‘aining. 

Another way to describe the results shown 
1 Figure 1 is to note that training task fidel- 
y affected only transfer performance on the 
iteraction task; thus, high-fidelity communi- 
ations training was desirable only for those 
2ams which would encounter the interaction 
ersion of the transfer task, there being no 
ignificant difference between the two data 
oints for the independent transfer task 
inction of Figure 1. 

Task organization. In the earlier study by 
Jaylor and Briggs (1965) task organization 
ras a statistically significant variable with 
erformance under an independent task or- 
anization being superior to that under an 
iteraction organization. The present results 
onfirm this previous finding and extend it: 
1 the earlier study a given team experienced 

particular task organization throughout 
oth training and transfer whereas the pres- 
nt design permitted task organization to 
vfluence transfer performance both as a 
raining variable (training task organization) 
nd as a system variable (transfer task or- 
anization). In the case of both variables, 
erformance was adversely affected by the 
iteraction condition, at least over the first 
wo transfer sessions. In a training context, 
hen, this suggests that individual training 
; superior to team training for the kind of 
ransfer task employed: one requiring the 
assing of responsibility from one operator 
9 the next (between RCs) with some joint 
ecision making but with little direct 
oordination in behavior of the two operators. 

The latter point is important, we believe, 
ecause team training is obviously important 
a tasks requiring high levels of coordination 
etween team members, for example, team 
ports such as football, basketball, etc. Thus, 
he above conclusion favoring individual 
raining has a limit, but neither this nor the 
arlier Naylor and Briggs study used a task 
pproaching that of team sports in terms of 


391 


the demand for coordination among team 
members. 

In terms of task organization as a system 
variable these results, like those of Naylor 
and Briggs, cast doubt on the popular notion 
of “teamwork”; in fact, independence of 
operator functions, not interaction among 
operators (as in a “team structure”), is 
emerging from numerous laboratory studies 
as the more desirable system engineering con- 
cept (see Kidd, 1959, p. 20, for an earlier 
suggestion of this conclusion). We conclude 
with Kidd (1961) that the interaction condi- 
tion “is superimposed on the normal demands 
of the task itself and leads to a proportionate 
reduction of exclusively task-directed behav- 
ior [p. 199].” In other words, the subject 
has a given capacity for information process- 
ing; if working independently, he can devote 
the entire capacity to the direct demands of 
the task; but if allowed, encouraged, or re- 
quired to interact with another subject, he 
must divide this capacity between the task 
and the interperson-interaction requirements, 
that is, interaction per se is a time- and 
capacity-consuming function which can de- 
tract from exclusively task-oriented behavior. 

The interaction of training and transfer 
task organization (see Table 3) requires an 
explanation. In particular, it seems rather 
odd that upon transfer to an interaction task 
organization those teams trained under the 
same task organization condition (Groups 7 
and 8) would attain significantly less pro- 
ficient performance than those trained indi- 
vidually (Groups 5 and 6). One possible hy- 
pothesis for this finding is that by virtue 
of their training, Groups 7 and 8 developed 
habits of verbal interaction among RCs which 
were excessive, while Groups 5 and 6 did 
not have this opportunity. As a result the 
RCs of Groups 5 and 6 would be less inclined 
to spend time away from the specific-task de- 
mands in order to interact with one another. 

Fortunately, all transfer sessions were 
recorded on audio tape and so a partial test 
of this hypothesis was possible. It is par- 
tial in that we could not define “excessive” 
verbal communication—we could only de- 
termine if RC-to-RC talk was greater for 
Groups 7 and 8 than for Groups 5 and 6 
during early transfer, Sessions 1 and 2. It is 
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partial also in that some of the tapes had 
been reused prior to the completion of the 
statistical analyses of the efficiency data, and 
intact tapes were available for only 6 of 
the 14 teams in Groups 5 and 6 and for only 
7 teams in Groups 7 and 8. 

The available tapes were scored by timing 
RC-to-RC communications for the first two 
transfer sessions. The hypothesis was sup- 
ported by the data which showed an average 
of 211 seconds of RC-to-RC talk for Groups 
7 and 8 while Groups 5 and 6 obtained an 
average of only 126.8 seconds. Due to the 
small n, these data were subjected to the 
Mann-Whitney U test, and the result was 
significant (U=11, p= .049) for a one- 
tailed test. This test is appropriate since 
the hypothesis predicted the direction of 
difference. 

It follows, then, that training in a team 
context (the interaction condition) resulted 
in a lack of discipline for inter-RC com- 
munications which became disruptive only 
upon transfer to an interaction condition 
(Groups 3 and 4 presumably acquired similar 
verbal habits, but their transfer to an inde- 
pendent task organization did not permit 
manifestation of these inclinations). Thus, 
whereas team training might have been pre- 
dicted to be superior to individual training 
for teams which were to transfer to an 
interaction task, just the reverse occurred, 
and the apparent reason was less discipline 
on inter-RC communication by those teams 
trained in a team context. 

These results emphasize again that system 
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performance can suffer if the task environ- 
ment permits or requires interaction among 
operators. However, as indicated by Groups 
5 and 6, system performance need not suffer 
under an interaction task organization, and 
so it is concluded that appropriate training 
conditions can help ameliorate an unfortunate 
system design feature. Also from these data 
is the interesting conclusion that with an 
appropriately designed operational task (the 
independent version of the transfer task) one 
can avoid a manifestation of inefficient habits 
acquired in a poorly designed training task 
(the interaction training condition), that is, 
Groups 3 and 4 in Table 3 achieved sta- 
tistically superior transfer task performance 
to that of Groups 7 and 8 after identical 
training conditions. 


REFERENCES 


Epwarps, A. L. Experimental design in psycholog- 
ical research. New York: Rinehart, 1960. 

Hrxson, W. C., Harter, G. A., Warren, C. E., & 
Cowan, J. D., Jr. An electronic radar target 
simulator for air traffic control studies. USAF 
WADC tech. Rep., 1954, No. 54-569. 

Kipp, J. S. A summary of research methods, 
operator characteristics, and system design specifi- 
cations based on the study of a simulated radar 
air traffic control system. USAF WADC tech. 
Rep., 1959, No. 59-236. 

Kipp, J. S. A comparison of one-, two-, and three- 
man work units under various conditions of work 
load. Journal of Applied Psychology, 1961, 45, 
195-200. 

Naytor, J. C., & Briccs, G. E. Team training effec- 
tiveness under various conditions. Journal of Ap- 
plied Psychology, 1965, 49, 223-229. 


(Received September 11, 1964) 


ournal of Applied Psychology 
965, Vol. 49, No. 6, 393-398 


COLOR CODING IN FORMATTED DISPLAYS* 


SIDNEY L. SMITH, BARBARA B. FARQUHAR 
The MITRE Corporation, Bedford, Massachusetts 
AnD DONALD W. THOMAS 


Tufts University 


An experiment was designed to assess and compare the effects of symbolic, 
numeric, and color coding in formatted displays. 12 Ss viewed displays in 
which 2-digit entries were arranged in tabular matrix format. Displays differed 
in density, structure, and auxiliary coding. Ss performed row-comparison and 
item-counting tasks, providing time and error measures. Auxiliary color coding 
resulted in better performance than superscript or underline codes for both 
tasks. Color coding was relatively more effective for item counting than for 
row comparison where the display format was related to the task. The value 
of a display code appears to be dependent upon the joint interaction of the 
format in which it is displayed and the task to which it is applied. 


Previous studies of display color coding 
ave confirmed the potential value of color 
; a means of providing visual separability 
nong classes of data presented in informa- 
on displays. The most recent research on 
lis question was reported by Smith (1963), 
id Smith and Thomas (1964). It has been 
yund that items displayed in a particular 
lor can be discriminated from other items 
| visual search tasks almost as well as if the 
ems of other colors were not present on the 
splay at all, and that color is superior to 
mbolic coding in this respect. This is true, 
, least, for a code using only five colors. 
he effectiveness of color coding has been 
tributed to the role played by color in 
liding the eye of an observer so that he can 
sily discriminate and scan a particular class 
displayed items and ignore those items not 
levant to his interests. 

In assessing these earlier results, it should 
> noted that the displays used were unstruc- 
red in their format: items were distributed 
mdomly in the display field. It is possible 
iat the demonstrated value of color coding 


1The research reported in this paper was spon- 
red by the Air Force Electronic Systems Division, 
rt Force Systems Command, under contract AF19 
28)2390. This paper is also obtainable as ESD-TR- 
-125. Further reproduction is authorized to satisfy 
e needs of the United States Government. A more 
tailed account of this research is available from 
e authors (Smith, Farquhar, & Thomas, 1965). At 
e time that the research was conducted, Donald W. 
1omas was employed by The MITRE Corporation. 


is unique to such displays. In many practical 
situations, however, people work from infor- 
mation displays where there already exists a 
structure or format in which the displayed 
data is organized. Any such consistent format 
should itself prove helpful to an observer 
scanning the display. Under these conditions, 
the relative usefulness of added color coding 
may prove somewhat diminished. 

Certain earlier studies, for example those 
by Hitt (1961), and Newman and Davis 
(1962), seem to indicate some utility of color 
coding in tasks involving item locating, count- 
ing, and decoding in tabular formatted dis- 
plays. In each case, however, the display 
format was used primarily as a convenience 
in designating the position of a particular 
item, rather than in some integral relation 
with the task itself. In the present study, an 
attempt was made to develop a format di- 
rectly relevant to the use of the display, to 
define display variables related to task diffi- 
culty, and to compare the effect on task per- 
formance of several auxiliary item-coding 
techniques, including color coding. 


PROCEDURE 


Twelve men and women with normal color vision 
participated in experiments which included both 
row-comparison and item-counting tasks. The Ss 
viewed displays in which 2-digit items were pre- 
sented in tabular matrix format. Each S$ was run 
individually, completing 378 trials distributed over 
three experimental sessions. rf 

In the first two sessions, Ss were asked to compare 
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the rows in each display to determine in which 
row the sum of the items would result in either 
the highest or the lowest total. In a third session, 
Ss were asked to count, for each display, the total 
number of items in a specified numeric interval. 
Ss reported their row selections or item counts 
to the E using a button insertion panel. The E 
recorded these responses and the time required in 
each case. 

Every display matrix contained 10 rows. The 
displays differed, however, in the number of columns, 
having either 2, 6, or 10. This resulted in total 
display densities of 20, 60, or 100 items.? Displayed 
items were 2-digit numbers, from 01 to 99. These 
items were chosen randomly for each display, but 
within constraints designed to produce certain dif- 
ferences among the 10 rows. 

In each display, there was one row that had, on 
the average, the highest item-entries, and one row 
that had the lowest. In half of the displays used 
in this study, the average of the entries in the 
high row was in the interval 81 to 90. For these 
same displays, the average low row entry was in 
the interval 11 to 20. These may be termed the 
“wide spread” displays. In the remainder of the 
displays, the average entries for the high and low 
rows had less extreme values, falling in the intervals 
71 to 80, and 21 to 30, respectively. These will be 
referred to as “narrow spread” displays. 

In each display, in addition to the high row, 
there were also 1, 2, or 3 rows whose entries were 
sufficiently high to compete for an S’s attention as 
he tried to select the highest row. These may be 
termed “high-alternative” rows. Their average entry 
was only 5 to 10 points less than the average for the 
high row in each display. Similarly, there were 1, 2, 
or 3 “low-alternative”’ rows whose entries averaged 
only 5 to 10 points higher than those in the low row. 


2It should be noted that the expression “display 
density,” as used here, refers always to the number 
of items presented. Since these matrix displays 
simply became “wider” as more columns of numbers 
were added, increasing display density did not 
result in packing the items more closely together. 
The term “density” has been used in this latter 
sense, to designate the average proximity of dis- 
played items, in other experimental contexts, for 
example, Ringel and Hammer (1964). 
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The balance of the rows in all displays, other than 
the high and low rows and their alternatives, con- 
tained entries whose average was in a middle or 
intermediate range of 40 to 60. For each display, the 
particular sequential order of the high and low 
rows, the alternative rows, and the intermediate rows 
was randomized. 

In summary, the variables related to display 
structure were (a) display density, (b) the degree 
of spread between the high and low rows, and (c) 
the number of competing alternative rows. A basic 
set of 18 displays was prepared, representing all 
possible combinations of these variables, to ex- 
amine the degree to which each of these differences 
in display structure contributed to the difficulty of 
the row-comparison task. Six additional display 
sets were made by permuting the matrices of the 
basic display set in a manner which did not alter 
their structural characteristics. 

This availability of these seven equivalent display 
sets permitted the use of still another display design 
variable, namely, auxiliary coding of the individual 
items in the displays. One display set was left un- 
coded. In three of the display sets, codes were 
added in a redundant manner so that each displayed 
item could be identified as belonging to one of 
five numeric intervals. The codes used were colors, 
superscripts, and underlines. The specific use of 
these codes is summarized in Table 1. The same 
auxiliary codes were added to three other display 
sets. However, the particular code value assigned 
to each displayed item was chosen at random. For 
these displays, then, the auxiliary codes were ir- 
relevant to the task and potentially distracting. 

It should be noted that the displays with relevant 
coding were designed in such a way that the sum 
of the intervals indicated by the codes was in pro- 
portion to the sum of the 2-digit entries. That is to 
say, if the Ss had relied solely on the code indicators, 
in no case would this have resulted in-an ordering 
of the rows discrepant with an ordering based on 
the actual numeric entries in the rows. It was pos- 
sible, however, especially on the low-density dis- 
plays, for rows to appear tied on the basis of the 
interval coding, in which case the Ss had to examine 
the actual numeric entries to determine the target 
row. 


TABLE 1 
SUMMARY OF RELEVANT CopErs USED 








Code value 
Numeric interval of 
displayed item Color Superscript Underline 
80-99 White (5Y 8/4) 5 ed 
60-79 Yellow (10Y 7/10) 4 — 
40-59 Red = (5R 4/14) 3 — 
20-39 Blue (2.5BG 4/6) 2 — 
01-19 Green (10GY 5/8) 1 - 





8 Munsell notation for projected colors. 


CoLoR CODING IN FORMATTED DISPLAYS 


Displays were presented by rear projection and 
ppeared as white or colored figures on a black 
ackground. As projected, the 100-density dis- 
lays were 25-inches square, with the other matrix 
isplays proportionately narrower. Displayed char- 
cter height of the numeric entries was + inch. Ss 
vere seated approximately 5 feet from the projec- 
ion screen with the center of the display at eye 
vel. When Ss were viewing the displays with a 
elevant auxiliary code, a reference display indicat- 
ng the code value assigned to each numeric interval 
ppeared on a second screen to the left of the main 
isplay. Ambient illumination, provided by overhead 
luorescent lamps, was approximately .3 footcandles. 

In the first experimental session, each S viewed 
he uncoded display set and the three display sets 
vith relevant auxiliary coding. During one half 
f this session, Ss were asked to determine the high 
arget row, the row whose entries yielded the highest 
um. During the other half of the first session, they 
iewed the same displays in a different sequence and 
vere asked to determine the low target row, the 
ow whose entries yielded the lowest sum. Six of 
he Ss estimated the high target row first and the 
ow target row second; the order was reversed for 
he other six Ss. The order of presentation of the 
ets of displays was varied systematically for the 
lifferent Ss. Over the course of the experiment, each 
lisplay set appeared equally often at each point in 
he session. The random sequence in which the 
lisplays appeared within each set varied for each S. 

The procedure in the second experimental ses- 
ion was identical to that of the first except that 
he displays shown were the three sets with ir- 
elevant coding. Ss were instructed to ignore the 
uxiliary codes and to depend entirely on the 2-digit 
sntries in determining the target row. 

In the third experimental session, Ss viewed all 
lisplays. Their task this time, however, was to 
count the number of matrix entries which fell 
within the intermediate numeric interval of 40 to 
9. In the first part of the session, the uncoded 
lisplay set and the three display sets with relevant 
wuxiliary coding were shown. During the second 
part of the session, Ss viewed the displays with 
rrelevant coding. 

In summary, the design of the experiment for the 
‘ow-comparison task was a complete factorial of 2 
Target Rows (high or low) X 7 Coding Conditions 
x 3 Display Densities X 2 Row Spreads X 3 Com- 
petitors. A total of 3,024 comparisons were made 
by 12 Ss, providing both time and error measures. 
For the item-counting task, the experimental design 
was a complete factorial of 7 Coding Conditions 
3 Display Densities X 2 Row Spreads X 3 Competi- 
tors. Twelve Ss were run for a total of 1,512 trials. 


RESULTS AND DISCUSSION 
Row-Comparison Task 


An analysis of variance of the time data 
for the row-comparison task confirmed that 
each main factor had a statistically signifi- 
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cant effect (p< .001) on row-comparison 
time. Ss were treated as replicates in this 
analysis; 75% of the variance was attribut- 
able to the within replicates factor, indicating 
that individual differences among Ss were 
considerable. 

The effect on performance of the task and 
display structural variables may be sum- 
marized as follows. Comparison time increased 
significantly as a function of the number of 
displayed items (display density) under all 
experimental conditions. Comparison took 
longer when the task was to determine the 
high row than when the task was to determine 
the low. row, suggesting that it is easier to 
add and compare small numeric entries than 
large ones, Comparison was quicker and sig- 
nificantly more accurate for the wide spread 
displays than for the narrow: it is probable 
that wide spread facilitated performance by 
permitting Ss to focus their attention more 
quickly on the rows of possible interest and 
to ignore others. Comparison was faster when 
there was only one competing alternative row 
(as defined in the initial display design). 
With more competing rows, there was an 
increase in both comparison time and error 
frequency. More than 90% of the errors made 
represented choices of alternative rows, sup- 
porting an initial postulate of the display de- 
sign that these rows would tend to compete 
for an S’s attention. Table 2 presents average 
row-comparison time data to summarize the 
effects of these variables on task performance. 

Of more interest, in the context of this 
study, is the effect of various display coding 
modes. Figure 1 shows average comparison 
time as a function of display density for the 
uncoded displays and for those with relevant 
codes, that is, for those displays in which the 
code values were redundant with the numeric 
entries and so could be used in the comparison 
task. 

The relevant color code resulted in signifi- 
cantly faster performance than any other code 
condition, an overall decrease of 47% from 
the time required for the uncoded displays. 
There was a corresponding decrease in error 
frequency of 43% for the color code as com- 
pared with the uncoded displays. 

Colors can not be added in the same sense 
that numerals can, and so the effectiveness of 
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TABLE 2 


COMPARISON TIME (IN SECONDS) AS RELATED TO VARIABLES INFLUENCING DispLay DirFIcuLty, 
AVERAGED OvER ALL DispLay-CopINc CONDITIONS 











High-row comparison 


Display density 


Low-row comparison 





Display density 








Condition 20 60 100 20 60 100 
Wide spread displays 
Alternative rows 
1 10.1 18.2 28.0 8.4 15.7 27.6 
2 11.5 24.6 30.0 10.5 25.4 31.8 
3 13.8 26.3 37.9 11.1 26.0 34.0 
Narrow spread displays 
Alternative rows 
1 11.6 23.5 S1e7 11.3 23.3 29.7 
2 11.9 32.7 41.5 £255 26.7 30.8 
3 13.8 29.5 48.4 i1e2 26.4 37.0 





color coding for this task is somewhat sur- 
prising. Why did color prove to be an effec- 
tive code in the row-comparison task? As a 
possible explanation of this result, many Ss 
reported that they were able to use the color 
code to eliminate certain rows from consid- 
eration merely by glancing at the prepond- 
erant colors of the entries. Some Ss also indi- 
cated that they ignored the actual numeric 
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NO CODE 
SUPERSCRIPT 
ao - UNDERLINE 





AVERAGE ROW COMPARISON TIME (SECONDS) 





O 20 60 
DISPLAY DENSITY 


100 


Fic. 1. Average row-comparison time as a function 
of display density for relevant item coding. 


values of the displayed items, and simply 
scanned, counted, and compared the numbers 
of colored entries. Thus, for these Ss the 
task was one of counting colored objects 
rather than of adding numeric entries, and 
from the results of previous studies it is known 
that color is a useful code for a counting task. 

The relevant underline code was not so 
effective as the color code, but was signifi- 
cantly faster than the other coding condi- 
tions. It resulted in an average reduction of 
29% from the row-comparison time required 
for the uncoded displays. The Ss reported that 
the underline code, by providing a kind of 
“visual weighting” of the rows during hori- 
zontal scanning, did permit rapid elimination 
of some rows from consideration. However, 
discrimination of the particular code “lengths” 
was not sufficiently good to provide clear 
visual separability among entries falling in 
different numeric interval classes, and so this 
code was less helpful in detailed row compari- 
sons. 

Average row-comparison time with the rele- 
vant superscript code was only 13% faster 
than with the uncoded displays, and this 
difference was not statistically reliable. Most 
Ss stated that they relied on simplified addi- 
tion techniques when dealing only with the 
numeric entries, as in the uncoded displays. 
For example, they might add just the 10s 
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ligits and use the ones digits only if the sums 
i the rows appeared to be tied. If the task 
vere performed in this way, then the avail- 
ibility of a displayed numeric simplification, 
n the form of superscripts, would do little to 
acilitate the process. 

A statistically significant interaction (p < 
001) was confirmed between the effects of 
oding and density: relevant item coding re- 
sulted in greater time saving for the more 
lifficult displays, those with higher display 
lensity. Looking at this another way, the 
relative value of coding seemed to remain 
constant over the range of display density in- 
vestigated. 

Average performance times for row com- 
parison under irrelevant code conditions, 
where code values were assigned randomly to 
the displayed items, were indistinguishable 
from one another. Although irrelevant codes 
did add clutter to the displays, they did not 
impair row-comparison performance: average 
derformance time under these conditions did 
not differ significantly from that for the un- 
coded displays. This finding could have impli- 
cations for display designers in situations 
where they might wish to provide a single 
jisplay, multicoded for users with different 
needs, 

Individuals differed considerably in their 
derformance on the row-comparison task. 
There was an order of magnitude difference 
‘n comparison time between the fastest and 
‘he slowest S, and some Ss made more than 
20 times the errors of others. There was a 
significant inverse relation between the time 
-equired and errors made. That is to say, it 
appears that Ss tended to trade-off accuracy 
and speed in establishing their individual 
oerformance standards, 

Ss were asked to rank order their pref- 
arences for the various coding conditions. 
Their preferences generally coincided with 
oerformance differences. Among the relevant 
codes, the tendency was to prefer color, under- 
‘ining, and superscripts to the uncoded dis- 
slays, in that order. For the irrelevant codes, 
‘here was no consistent preference pattern. 


'tem-Counting Task 


An analysis of the time data for the item- 
»ounting task confirmed that the main factors 
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of display density, display spread, and item 
coding had statistically significant effects (p 
<.001) on item-counting time. Ss were 
treated as replicates, and less than 20% of 
the total variance was attributable to dif- 
ferences among Ss in this task. 

Counting time and frequency of errors 
both increased significantly with increasing 
display density. The relation is linear over the 
range of display densities investigated, and 
confirms similar results from previous studies 
(Smith, 1963; Smith & Thomas, 1964). 

Performance was significantly better on the 
wide spread displays than on the narrow. As 
it happened, because of the logical structure 
of the displays, the number of relevant items 
to be counted (in the central numeric interval) 
was almost twice as great in the narrow spread 
condition as in the wide. It is probable that 
this accounts for the significant difference in 
counting performance. 

Average counting time for the uncoded 
displays and those with relevant item coding 
is shown as a function of display density in 
Figure 2. Performance was best, in terms of 
both counting time and error frequency, for 
displays with relevant color coding. This sig- 
nificant improvement in performance at- 
tributable to the color code, as compared with 
the uncoded displays, represents an average 
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Fic. 2. Average counting time as a function of dis- 
play density for relevant item coding. 
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reduction in counting time of 72%. There 
was a corresponding decrease in error fre- 
quency of 86%. Performance was substan- 
tially equivalent for all other item-coding 
conditions. 

All 12 Ss preferred the relevant color code 
for the item-counting task. There was no con- 
sistent order of preference for the other con- 
ditions. This is reasonable in light of the fact 
that the relevant underline and superscript 
codes resulted in no significant improvement 
in performance on the counting task. Evi- 
dently these codes did not provide improved 
visual separability among the item classes, 
above that based on the numeric entries them- 
selves. Most Ss reported that they found it 
difficult to use these codes in the counting 
task and so ignored them. It may be recalled, 
however, that the relevant underline code did 
result in significant improvement in perform- 
ance on the row-comparison task, where its 
representation of item magnitude was well 
related to the display format and the row- 
scanning task. 

The demonstrated effectiveness of relevant 
color coding in an item-counting task sup- 
ports the results of earlier studies with un- 
formatted displays (Smith, 1963; Smith & 
Thomas, 1964). In a counting task, each dis- 
played item must be considered and either 
included in the count or excluded from it. The 
high degree of visual separability provided by 
the color code permits rapid scanning and 
selection of the class of items to be counted. 
In this present experiment, where the dis- 
plays were formatted, the average counting 
time for the relevant color-coded displays was 
6.1 seconds. In a similar experiment, where 
the displays were not formatted (Smith & 
Thomas, 1964), the average counting time 
was 8.1 seconds. There were differences in 
the nature of the particular items displayed, 
and different Ss participated in the two 
studies. However, it seems probable that at 
least part of the difference in counting time 
should be attributed to the fixed display for- 
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mat in the present study, which may have 
served as an aid in item scanning. Additional 
research would be required to confirm this 
hypothesis. 

It is interesting that the relative value of 
relevant color coding was different for the 
item-counting and row-comparison tasks, For 
the counting task, there was a sizable de- 
crease in both time required and errors made 
when the Ss were working with color-coded 
rather than uncoded displays. For the row- 
comparison task, the improvement in per- 
formance attributable to the color code, al- 
though significant, was smaller in relative 
magnitude. This suggests that in more com- 
plex tasks, those involving the interpretation 
of displayed information, the availability of 
an effective means of providing visual sep- 
arability among data classes, such as color 
coding, may benefit some components of the 
task and not others. The value of a display 
coding scheme will depend in some measure 
on its application in a specific job situation, 
and will not be completely predictable from 
data obtained with simple search and counting 
tasks. 
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The sample consisted of 217 general practitioners of the state of Utah. 80 
scores relevant to the performance of these physicians were collected from a 
variety of scores, intercorrelated, and factor analyzed using the principal 
components solution based on eigenvalues and eigenvectors. The 30 factors 
which had an eigenvalue greater than 1.00 were rotated by the varimax pro- 
cedure and interpreted. The most important finding was the great criterion 
complexity for this group of physicians. This complexity suggests that one 
cannot adequately measure physician performance on the basis of a single 
score or a few scores. Instead, one must obtain a relatively large number 
of scores. Performance in both premedical and medical education was inde- 


pendent of performance as a physician. 


All too frequently, it has been assumed 
that an investigator can get a simple, satis- 
factory measure of a performance criterion 
with only one, or, at most, a very few meas- 
ures. Similarly, it has been assumed that 
a single intermediate criterion measure, such 
as a grade-point average, can be employed 
as an adequate approximation of an ultimate 
criterion, such as on-the-job performance. 
However, an accumulating body of evidence 
suggests that both types of assumptions are 
aot justified. For instance, the work of factor 
analytic investigators, for example, L. L. 
Thurstone (1950) and J. P. Guilford (1957), 
and such findings as “the inadequacy of 
indergraduate grades as a substitute criterion 
‘or on-the-job research performance” [Taylor 
x Barron, 1963, pp. 72-5] are forcing psy- 
shologists to recognize in their experimental 
studies that human ability is extremely com- 
olex and multidimensional. Although less 
widely recognized, human occupational per- 
‘ormance is also complex and studies of its 
sriteria must include a multiplicity of meas- 
‘ires to even begin to encompass performances 
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and achievements of a given field of occupa- 
tion. For example, Taylor and his associates 
(Taylor & Barron, 1963) factor analyzed 
over 50 measures of the on-the-job perform- 
ances of a group of physical scientists and 
found 17 independent aspects of the produc- 
tivity, creativity, and other contributions of 
these scientists. Similarly, Price, Taylor, Rich- 
ards, and Jacobsen (1963) have investigated 
the criterion problem for both a medical col- 
lege faculty and a group of medical specialists 
and in both cases have obtained approxi- 
mately one independent dimension or factor 
for every three performance measures in- 
cluded in their analyses. It would appear, 
therefore, that a need exists for multidimen- 
sional research to investigate the dimensions 
and correlates of the performance, accom- 
plishments, and contributions of most pro- 
fessional and occupational groups. 
Accordingly, the present study was con- 
ducted to provide some of this knowledge 
for medical general practitioners.* The basic 
procedure was to obtain a large number of 
measures of on-the-job performance of phy- 
sicians in general practice in the state of Utah. 
These measures were then intercorrelated and 


4The material presented in this article represents 
only one phase of an ongoing project in which on- 
the-job performance measures are being investigated 
for a college of medicine faculty (Taylor et al., in 
press), medical specialists (Richards, Taylor, Price, 
& Jacobsen, 1965), and medical general practitioners. 
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factor analyzed® to yield indices of several 
independent aspects of the achievements and 
contributions of these physicians. 


METHOD 


Criterion measures. Seventy-seven scores relevant 
to physician performance were obtained from a 
variety of sources: official records, compendium 
listings, and personal interviews. The first 6 of these 
scores included 1 score based on reputation among 
colleagues as indicated by peer nominations, 2 
scores concerning intern and residency training, 1 
score concerning publications, 1 score dealing with 
professional mobility, and 1 score based on listings 
in current honorary compendiums such as Who’s 
Who in American Medicine. The next 64 scores were 
obtained during an interview with each participat- 
ing physician, including 1 score based on income, 
2 scores having to do with memberships in pro- 
fessional societies, 1 score from a special question- 
naire dealing with sources and degrees of occupa- 
tional satisfaction, 57 scores based on answers to 
direct questions concerning a variety of aspects of 
each physician’s practice, 3 scores dealing with the 
physician’s image of success, and 1 score repre- 
senting the physician’s self-assessment of his own 
success. The final 7 scores included 3 ratings by the 
particular project researcher who conducted the 
interview, and 4 “control” scores involving such 
variables as years of experience. In addition, 3 
scores measuring performance in premedical and 
medical education were obtained (see Footnote 5). 

Subjects. The population under investigation con- 
sisted of 498 physicians practicing in the state of 
Utah who had not fulfilled the American Board 
requirements for certification as a specialist, and 
who, therefore, were classified as general practi- 
tioners. Letters signed by the Dean of the Univer- 
sity of Utah College of Medicine were sent to 
these physicians requesting that they participate in 
this project. Such participation primarily involved 
an interview lasting approximately 1 hour. A sec- 
ond letter was sent to those physicians who did 
not respond to the first letter; physicians who did 
not respond to the second letter were dropped. 
Ultimately, there were 217 physicians who agreed 
to participate; this smaller group, therefore, was the 
sample actually studied in detail in this research. 

The criterion scores available for all participating 
and nonparticipating physicians were the scores con- 
cerning the age at which the MD degree was ob- 
tained, the score involving peer nominations, the 
score involving publications, and the score indi- 
cating the degree to which the physician specialized 
within his general practice. On these 4 scores, statis- 
tical tests were made comparing the physicians who 
did not participate with the physicians who did. 
Two of these differences were significant at the .05 
level, indicating that nonparticipants had fewer pub- 


5 All computations were made at the Western 
Data Processing Center, University of California, 
Los Angeles. 
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lications and were nominated less frequently by 
their colleagues than participants. It would appear, 
therefore, that the participating physicians were not 
completely representative of Utah’s general practi- 
tioners. However, since the present study is con- 
cerned with the relationships between variables 
rather than, say, estimating typical level of per- 
formance on given variables, in the opinion of the 
authors the results of this study are not seriously 
biased by this lack of full representativeness. 

Procedure. Scores on all variables, with the ex- 
ception of dichotomous control variables were con- 
verted to normalized standard scores (Guilford, 
1950) with a mean of 50 and a standard deviation 
of 10. The next step in the data analysis was to 
compute the 3,160 intercorrelations among the 80 
scores. The resulting correlation matrix was then 
factor analyzed, using the principal components 
solution based on eigenvalue and eigenvector anal- 
ysis (Harmon, 1960). Unity was placed in the di- 
agonal cells of the correlations matrix, all factors 
having an eigenvalue greater than 1.00 were ex- 
tracted, and these factors were rotated to a final 
solution. 

In any criterion research of the magnitude of the 
present investigation, some missing scores are in- 
evitable. Since the computer program used does not 
allow for missing data, the mean score, or 50, was 
substituted for all missing scores. The effect of 
such a substitution is to reduce the correlation be- 
tween variables. This, in combination with unity 
in the diagonal, introduces some slight bias toward 
unique factors. Little is known about the general 
effects of this bias, but for this particular study the 
authors feel that the number of substituted scores 
was small enough so that the results were not seri- 


ously affected by substituting the mean for missing 


data. 
RESULTS 


A large number of factors, namely 30, had 
an eigenvalue greater than 1.00 and were 
included in the further analysis. A brief de- 
scription and analytical interpretation of each 
of the rotated factors is given in the follow- 
ing material: 

Level of medical specialization. The highest 
loading variables on this factor portray a 
physician who has a large number of pa- 
tients referred to him by other physicians, 
who tends to specialize within his general 
practice, and who has had longer and more 
varied internship and residency training than 
his colleagues. In conjunction with this fac- 


tor’s secondary loadings, the overall impres- 


sion is of an efficient, busy physician who 


6 Copies of the intercorrelation matrix and un- 
rotated factor matrix are available from Calvin W. 
Taylor, Department of Psychology, University of 
Utah. 
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is functioning as both a general practitioner 
and a medical specialist even though he has 
not completed any of the American Board 
specialty examinations. (This factor pattern 
s similar to a factor found by the authors in 
a study of certified medical specialists, Price 
st al. 1963.) 

Medical consulting. The two highest load- 
ings on this factor were for the variables deal- 
ing with the extent to which the physician 
consults with other physicians and the num- 
ber of patients referred to other physicians 
for treatment, respectively (a similar factor 
was also found in the aforementioned study 
of certified specialists). In general, one gets 
the picture of a general practitioner who is 
actively fulfilling the vital role of providing 
discriminating guidance for his patients to 
the multiplicity of a specialist’s services. 

Satisfaction from interpersonal relation- 
ships. ‘The physician scoring high on this 
factor spends a greater than average propor- 
tion of his time actually working with pa- 
tients, derives an above average amount of 
satisfaction from his medical work, and fre- 
quently delivers speeches to laymen groups 
on both medical and nonmedical topics. Other 
secondary loadings show a general lack of 
involvement in the medical profession per se. 

Medical charity work. The only two high 
loading variables on this factor suggest that 
a lenient, charitable practitioner is presented 
dy its pattern; he voluntarily provides an 
above average amount of free medical care 
doth in his office and through local charitable 
‘nstitutions. (A congruent factor was found 
n the previously cited study of medical 
specialists. ) 

Size of medical practice. The variables 
oading highest on this factor deal with size 
of the physician’s practice, number of hos- 
vitalized patients, amount of time devoted to 
‘he doctor’s profession, and gross income, re- 
spectively. Other secondary loadings clearly 
support this factor’s title. Interestingly, such 
1 physician tends to rate himself, but not to 
ye rated by his colleagues as being quite 
successful. 

Established medical practice. A rather clear 
mpression emerges on this factor of a gen- 
tral practitioner who quickly established 
uimself in a given locality and through the 
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years has developed his practice to the point 
that he can deliberately limit and control the 
size of his practice to meet his professional 
needs and standards, for example, he has a 
large number of patients, rigidly adheres to 
his appointment schedule, and discourages 
emergency office or hospital calls. 

Amount of malpractice insurance carried. 
This factor is one of the few found which 
does not seem to isolate an area dealing with 
some type of endeavor that would be con- 
sidered a genuine professional contribution. 
Specifically, its highest loading is on the 
variable measuring the amount of malprac- 
tice insurance carried, with secondary loadings 
indicating a below-average reputation among 
medical colleagues, a tendency to disregard 
psychological or psychosomatic considera- 
tions in medical diagnoses, poor attendance at 
scientific or professional meetings, and an 
above-average satisfaction with the size of 
one’s medical practice. Thus, one gets the im- 
pression of a physician who lacks a deep 
commitment to medicine. 

Group or clinic practice. The characteristics 
presented by this factor appear consistent 
with a dichotomy between those physicians 
in individual practice and those in some form 
of group practice: its highest loadings are 
typified by a doctor practicing in a group 
or clinic with a well-equipped office that is 
manned by a large ancillary staff of nurses. 
Significantly, this factor did not load on any 
of the measures related to the quality of 
medical care provided. Peterson (1963) re- 
ports a positive relationship between group 
or clinic practice and quality of care given 
to the patient. 

Emphasis on medical equipment. This fac- 
tor’s most salient feature is that the high- 
scoring physician places an above-average 
emphasis on maintaining an expensive, well- 
equipped office with laboratory facilities. A 
large number of secondary loadings appear 
on this factor which are not obviously re- 
lated, making this factor quite complex and 
difficult to succinctly label. 

Diagnostic thoroughness. This factor is 
similar to one found in the study of certified 
specialists. In both samples, the physician 
typified by the factor makes extensive use 
of information derived from both his own 
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office facilities and those of the local hospitals 
and laboratories in reaching his diagnoses, 
which he carefully explains to the patients 
concerned, even though excessive time is 
required. 

Hospital staff recognition. The high-scoring 
doctor on this factor is found to maintain 
courtesy privileges in a large number of 
hospitals that are judged to maintain superior 
medical standards, to send his patients to 
these same quality hospitals, and to have 
received lengthy internship and_ residency 
training in a variety of hospitals. Since deci- 
sions concerning the granting of hospital privi- 
leges (which vary in stringency with various 
hospitals) are made by physicians associated 
with the hospitals and in view of the lengthy 
and varied internship and residency training 
associated with this factor, there is some indi- 
cation that the high-scoring physicians have 
“learned the ropes” in gaining the acceptance 
of hospital-selection committees. 

Quality of hospital responsibilities. This 
factor is so named because its highest loading 
indicates the physician has an above-average 
number of responsibilities in the hospitals in 
which he works. This loading, together with 
the secondary loadings, suggests that it typi- 
fies a young, ambitious, well-trained general 
practitioner who is attempting both to in- 
crease the size of his practice by somewhat 
superficial techniques (modern appearing 
office, adhering to his patient appointment 
schedule), and to quickly gain some visibility 
within the profession (large number of hos- 
pital duties, experimentation with drugs). (A 
similar factor pattern was found in the au- 
thor’s study of certified specialists.) 

Self-Evaluated contribution to medicine. 
High variable loadings show that the physi- 
cian characterized by this factor, when asked 
what his major contributions have been, an- 
swers with a large number of contributions 
which are directly related to medicine, rather 
than society in general and which are judged 
to be significant ones by expert medical 
judges. It is worth noting that the variables 
dealing with peer evaluations did not appear 
on this factor. 

Self-Evaluated contributions to society. On 
this factor the high-scoring doctor tends to 
elicit an above-average number of general 
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contributions to society, rather than his pro- 
fession, when asked what his major contribu- 
tions have been. The secondary loadings sup- 
port this interpretation: frequent speeches to 
laymen groups on nonmedical topics, inactiv- 
ity in medical research, and when asked what 
most contributes to a doctor’s success the 
doctor mentions unusual or uncommon at- 
tributes or skills. Significantly, other second- 
ary loadings seem to describe the stereotyped 
notion of a “society doctor,” for example, 
upper socioeconomic-level patients, a light pa- 
tient load, and a well-equipped, modern office. 

Attainment in publications. This factor ap- 
pears to be fairly specific in relating to the 
number of publications produced. Other than 
this, there is little that characterizes the high 
scorer. (Since only about 10% of this sample 
has ever published, this factor may be tech- 
nique determined due to stretching of the 
number-of-publications vector to unit length 
by placing unity in the diagonal.) (A similar 
factor was found in the study of medical 
specialists. ) 

Professional recognition for achievements. 
The most salient features of this factor are 
the highest loading on the variable dealing 
with the number of listings in honorary com- 
pendiums, such as Who’s Who in American 
Medicine, a high loading on the variable 
measuring the number of papers read at scien- 
tific and professional society meetings, and 
secondary loadings which contribute to an 
impression of a physician that maintains high 


standards of medical care for his patients. © 


(Similar, but not identical factors were found 
in the authors’ studies of certified specialists 
and a college-of-medicine faculty.) 
Self-Evaluated overall professional success. 
When asked to evaluate himself in terms of 
his success in the medical profession the 
physician typified by this factor tends to rate 
himself above average. Overall, the other fac- 


tor loadings fail to lend support to his evalua- ~ 
tion. (A similar factor bearing the same title © 


was found in the study of medical specialists. ) 
Keeping abreast by means of journal read- 
ing. The three major loadings on this factor 


are all on the variables concerning the use — 


a 
? 
e 


of professional journals as a source of keeping 


informed of developments in the physician’s 
field. Other less significant loadings indicate 
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uctivity in medical research and professional 
society participation of all levels (local, state, 
egional, and national). (This factor pattern 
was also found with the study of medical 
specialists and the college-of-medicine faculty 
study.) 

Keeping abreast by means of refresher 
courses. As with the above factor, this is an- 
ther source-of-information factor but it is 
specifically related to postgraduate education. 
[t shows that doctors characterized by this 
actor rely on and take a relatively large 
tumber of university level postgraduate or 
‘efresher courses (which carry no academic 
sredit) in order to keep informed of current 
levelopments in medicine. (A somewhat simi- 
ar factor was found in the study of certified 
specialists. ) 

Keeping abreast by means of professional 
‘ociety courses. This is still another source- 
yf-information factor and primarily involves 
he variable measuring the extent to which 
he general practitioner utilizes classes or 
ourses sponsored by scientific or professional 
societies in keeping abreast of medical devel- 
ypments. As expected, such a physician also 
ittends a slightly greater than average num- 
yer of national and regional society meetings. 

Keeping abreast by means of uncommon 
‘echniques. This is the final source-of-infor- 
nation factor. Its two highest loading varia- 
les describe a general practitioner that relies 
m both postgraduate university courses 
‘which carry academic credit) and uncommon 
yr unusual techniques for keeping up with 
yrogress in medicine (e.g., listening to tape 
recordings of abstracted journal articles). 
ince less than 20% of physicians studied 
lave ever taken such postgraduate courses 
hese loadings suggest some divergence from 
he norm with respect to sources of informa- 
ion. 

Participation in professional societies. The 
yhysician portrayed by this factor attends, 
iolds memberships, and has held offices in 
| large number of scientific and professional 
ocieties, some of which are relatively pres- 
igeful. Three other significant loadings show 
he high-scoring physician has been in prac- 
ice for some time, belongs to several social 
ganizations, and engages in church activi- 
ies that extend beyond regular attendance. 
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Overall, the impression is given of a physi- 
cian who, after years of experience, has an 
established practice and can now direct his 
energies toward social and professional society 
activities wherein he has considerable interest 
and skill. (Similar factors were found in both 
the medical specialist and college-of-medicine 
faculty studies.) 

Off-the-job-socializing. High loadings on 
this factor describe a physician that belongs 
to a large number of social and avocational 
organizations, has held offices in such organi- 
zations, has numerous lay speaking engage- 
ments, enjoys a large income, and tends to 
be in an individual rather than a group or 
clinic practice. In general, an impression is 
given of a general practitioner who fulfills 
the role of a social leader and link between 
the medical profession and social and laymen 
groups. (A similar factor was found in the 
college-of-medicine faculty study.) 

Civic participation. This companion fac- 
tor to the preceding factor shows the high 
scorer both belongs to and holds offices 
in civic and political organizations, delivers 
speeches to laymen groups, and tends to men- 
tion things related to society, rather than 
medicine when asked what his contributions 
have been. (Similar, but more complex fac- 
tors were found in the specialist and college- 
of-medicine studies.) 

Professional stability. Overall, this factor 
pattern seems to measure the degree to which 
the physician it typifies quickly and delib- 
erately sank his roots in Utah to practice 
medicine: it has negative loadings on the 
variables measuring professional mobility 
rate, years of professional experience, overall 
satisfaction with the practice of medicine, and 
a positive loading on the variable concerned 
with involvement in church activities (which 
can be quite pronounced in this state). 

Leisure planning. The two highest loadings 
for this factor are on measures concerned with 
the amount of vacation taken annually and 
the degree to which weekly leisure time activi- 
ties are scheduled and followed through. Sup- 
porting this, it is observed that the doctor 
scoring high tends to be in a group or clinic 
practice which provides a setting for planned 
vacations and leisurely pursuits, has only an 
average income, and claims to be quite satis- 
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fied with his role in medicine. (A slightly 
similar factor pattern was found in the study 
of certified specialists. ) 

Interviewer’s rating of likeability. The only 
significant feature of this factor is the high 
loading of the variable measuring the impact 
a physician had on the particular researcher 
that interviewed him. Otherwise, there is 
nothing outstanding about the high scorer. 
(An almost identical factor was found in 
the medical specialist and college-of-medicine 
faculty studies.) 

Unorthodox success image. This factor’s 
major loading indicates the physician char- 
acterized by it has unusual or uncommon 
ideas about how one becomes outstanding in 
medicine. Two other features show the doctor 
has difficulty in communicating with his pa- 
tients and is not highly regarded by his gen- 
eral practitioner colleagues. (A factor bearing 
the same title was found in the specialist 
study.) 

Late attainment of MD degree. The highest 
loading variables on this factor are on the 
control variable dealing with the age the 
MD degree was received and on the variable 
measuring the extent to which physicians 
rely on drug detail men to keep them in- 
formed of developments in medicine. Sec- 
ondary negative loadings are found on the 
measures for gross income and patient load. 
(A comparable factor was obtained in the 
specialist study.) 

Achievement in education. Significantly, 
this factor has high loadings on all three 
measures of academic performance and zero- 
order loadings on all remaining variables. In 
other words, other than being a good grade- 
getter in premedical and medical education, 
there is nothing at all that distinguishes the 
high scorer on this factor. Moreover, in view 
of the consistency of this finding across all 
of the physician samples studied, which to- 
gether give a fairly comprehensive sample 
of the whole gamut of medical practice (med- 
ical educators, researchers, specialists, and 
general practitioners), it was felt worthwhile 
to specifically examine the relationship be- 
tween academic grades and the numerous 
measures of on-the-job performance. In so 
doing, a frequency distribution of the 1,082 
correlation coefficients between the measures 
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of academic achievement and the measures 
of physician performance that were obtained 
across all physician groups studied was plotted 
(this is represented by the histogram in Fig- 
ure 1). As such, it was found that 97% of 
these intercorrelations were found to be of 
essentially a zero magnitude (between —.20 
and +.20). In other words, only 3% of the 
intercorrelations were of a sufficient magni- 
tude to indicate a significant relationship 
between undergraduate and medical-school 
grades and performance as a practicing phy- 
sician. Also, when we superimposed a “nor- 
mal” or random-error curve (this is repre- 
sented by the smooth profile in Figure 1) on 
the histogram and measured how close the 
histogram approximated the normal curve, 
it was found that the obtained frequency dis- 
tribution was slightly (but not significantly) 
platykurtic, significantly skewed toward the 
negative end of the frequency distribution, 
and with a mean not significantly different 
from zero. This means that the measures of 
academic performance show essentially only a 
random-error (chance) relationship to the 
measures of on-the-job performance with a 
very slight tendency for academic grades to 
correlate negatively with the measured per- 
formance of practicing physicians. 

Also, as with the full-time faculty and 
the urban-specialist studies the argument 
that such results are attributable to restriction 
of range seems questionable at best. For ex- 
ample, the rotated factor matrix shows that 
no single measure with factor loadings greater 
than .20 fell into more than 10 of the 30 
factors, the average number being approxi- 
mately 3, and the 3 measures of academic per- 
formance (premedical and the first and last 
2 years of medical school) all appeared in 
only 2 of the 25 factors. In view of this, it 
is unlikely that if grades were unrestricted — 
in range they would suddenly appear in 3 
of the factors, and extremely improbable that 
they would fall into all 30 factors.? Since it 
seems clear that the characteristics rewarded — 
by medical school grades should be the same — 
characteristics involved in successful perform-— 

7A detailed explanation of why the authors do — 
not feel these multidimensional results could be ac- — 
counted for to any large degree by such explanations 


as restriction of range is available for the interested 
reader (see Price et al., 1963). 
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Fic. 1. Frequency distribution of intercorrelations between measures of academic performance and 
measures of physician performance, compared with best-fitting normal curve for same data. 


nce as a physician and since these results 
uil to demonstrate such a relationship, there 
_a demonstrated need for considerable re- 
»arch in the appropriateness of present-day 
1edical-school evaluation techniques. 


DISCUSSION 


As with our studies of urban medical spe- 
alists and a college of medicine faculty, the 
ssults of this study once again emphasize 
1e great complexity of the total criterion 
toblem. All totaled, 30 factors or independ- 
it performance dimensions were obtained. 
enerally speaking, there was very little 
verlap between the different measures rele- 
ant to physician performance: one factor or 
arformance dimension was obtained for ap- 


proximately every three scores or measures, 
and the model number of factor loadings 
greater than .20 on the individual factors was 
also three. 

Moreover, these results, at best, are only 
a conservative estimate of the complexity of 
the performances of medical general practi- 
tioners. For example, because we felt there 
were possible differences in the community 
settings of rural and urban general practi- 
tioners which would influence our findings, 
we conducted two substudies (Price et al., 
1963) in which the urban and rural general 
practitioners were investigated separately. In 
each case, 28 criterion measures (factors) 
were obtained, of which only 16 were felt 
to be similar enough in their factor loadings 
to be given the same titles as corresponding 
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factors in the total general-practitioner sam- 
ple reported in this article. 

In addition, we do not feel that this study 
has exhausted even a greater part of the 
possible criterion dimensions that could per- 
tain to the performances of general practi- 
tioners. No attempt was made to obtain either 
patient or hospital-staff evaluations of the 
physicians studied and no effort was made 
to study directly the quality of medical care 
provided by the individual physicians through 
such techniques as the Medical Audit 
(Mortrud, 1953). Since there is little reason 
to suppose such measures overlap to any 
great extent the measures used in the present 
study, the complexity of the total area of 
general-practitioner performance should be 
even greater than that found by the present 
analysis. 

With reference to practical application, the 
results of this study emphasize that one 
should not attempt to measure physician per- 
formance by one, or even a few criterion 
scores. Instead, they indicate that there are 
many outlets through which a general practi- 
tioner can expend his energies and efforts in 
his professional career. As a result, in assess- 
ing the overall accomplishments and contri- 
butions of a physician is not appropriate to 
categorize him as a “good” or “bad” physi- 
cian, but to determine in which outlets he 
has expressed himself and how effectively he 
has achieved in each of them. Granted, this 
is not simplifying the problem of evaluation 
and, admittedly, it is postponing syntheses. 
However, it is coming closer to empirically 
approximating both the professional and lay 
observation (Funkenstein, 1962; Johnson, 
1962) that physician performance is a com- 
plex phenomenon not to be explained by a 
single, simple unitary concept or measure. 

Similarly, in terms of predicting on-the- 
job performance it is unrealistic to expect a 
single index, measure, or test to adequately 
predict overall achievement in every aspect of 
actual medical practice. For instance, in the 
present case a single predictor has only 100% 
of its variance available to overlap the 30 
independent performance dimensions (fac- 
tors) that this study has uncovered. Thus, if 
such a predictor were centrally located with 
its total variance split into 30 equal parts 
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only % of its variance could overlap each 
of the 30 performance dimensions, in which 
case the maximum correlation that could be 
obtained between the predictor and each of 
the performance dimensions would be \/.03 
or .18. It is no wonder, then, that academic 
performance (grade-point averages) generally 
failed to show a significant relationship to 
the numerous performance dimensions studied 
in this research. At best, if academic grades 
are leaning toward a few performance criteria 
(rather than being centrally located in 30 di- 
mensions) they could only be expected to be 
highly related to a few dimensions of physi- 
cian performance. In view of this, medical- 
school selection programs, which tend to place 
heavy emphasis on academic grades in the 
selection and evaluation of their students, 
should be searchingly examined. 
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VOCATIONAL INTERESTS OF COLLEGE FRESHMEN 
AND THEIR SOCIAL ORIGINS* 


VIVIAN H. HEWER 


University of Minnesota 


The relationship between interests as measured on the SVIB and socioeconomic 
status of college students was explored. 9 groups of entering college fresh- 
men were selected on the basis of father’s occupation and educational level 
of both parents. Differences among distributions of the 9 groups on each of 
48 SVIB scales were tested for significance using the analysis of variance 
test. Conclusions are: measured vocational interests of college students were 
not independent of social origin, college students of lesser cultural background 
tended to identify with occupations requiring quantitative and technical train- 
ing, extent of overlap between social groups on SVIB scales was high. 


The vocational interests of fathers and sons 
re related. Strong (1957) reported a cor- 
lation of .29 between scores of sons and 
ithers on 20 scales of his test. Gjerde (1949), 
Iso using the Strong Vocational Interest 
lank (SVIB), reported a correlation of .20. 
lot only are the measured interests of fathers 
nd sons related, but Berdie (1943), also 
sing the SVIB, found that measured inter- 
sts of sons were related to father’s occupa- 
on. High school graduates whose fathers 
ere in the skilled trades tended to have 
iterests in scientific and technical fields 
nd students whose fathers were in business 
nded to have measured interests in busi- 
ess. Berdie concluded that familial fac- 
rs influence vocational interests. Erlandson 
1953), studying socioeconomic factors re- 
ited to vocational interests as measured by 
VIB, concluded that there is actually some 
‘lationship between father’s occupation and 
aving a primary interest pattern in a given 
-oup. He reported similar conclusions when 
ither’s and mother’s educations were de- 
sndent variables. He studied each variable 
:parately rather than selecting individuals 
ho had social backgrounds meeting all three 
‘iteria, father’s occupation and father’s and 
other’s education. 

The purpose of this study was also to ex- 
ore the relationship between vocational in- 
rests, measured on the SVIB, of college stu- 


1 Research reported here was supported by a grant 
om the Graduate School Research Fund of the 
niversity of Minnesota. 


dents and their social origins. It bears a 
relationship to studies already cited because 
the father’s occupation was one of the varia- 
bles used to determine socioeconomic status. 
Of the studies mentioned, it most closely re- 
sembles that part of Erlandson’s (1953) re- 
search concerned with father’s occupation, 
but differs in several ways. The social groups 
were selected differently and the use of SVIB 
scales rather than groups led to a different 
analysis of the data. 

Means of socioeconomic groups on SVIB 
scales and patterning of vocational interests 
for an individual group were examined. For 
example, of the 48 SVIB scales studied, on 
how many and on which ones did the sons 
of farmers get the highest mean scores? On 
which scales did they get their lowest scores? 
In general, did students from homes of lesser 
cultural advantage have different vocational 
interest patterns from those of superior ad- 
vantage? The practical significance of differ- 
ences in scores on SVIB scales among the 
groups was also explored. Implicit within the 
results are suggestions about the origin of 
interests. 


MeETHOD 
Subjects 


In the fall of 1959, 94%, or 4,283, of all entering 
freshmen at the University of Minnesota reported 
the occupational and educational backgrounds of 
their parents on a questionnaire. The occupations 
of the fathers were coded using the Dictionary of 
Occupational Titles (DOT). 

Nine groups of students were chosen, correspond- 
ing largely to the nine major groupings of the 
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TABLE 1 


PARENTAL OCCUPATIONAL AND EDUCATIONAL LEVELS FOR THE 





Occupation of father 


and DOT code 


. Professional 


0-00 thru 0-39 


. Semiprofessional 


0-40 thru 0-69 


. Managerial-High 


0-70 thru 0-99 


. Managerial-Low 


0-70 thru 0-99 


. Clerical 


1-00 thru 1-49 


. Sales 


1-50 thru 1-99 


. Agriculture 


3-00 thru 3-99 


. Skilled 


4-00 thru 5-99 


. Service 2—2-99 


NINE SocioEcoNomiIc SUBGROUPS 


Range of educational 
level of father 


Range of educational 
level of mother 





College graduate— 
PhD 


High school graduate— 
some college 


High school graduate— 
some college 


High school graduate— 
some college 


High school graduate 


High school graduate— 
some college 


Eighth grade or less— 
high school graduate 


Some high school— 
high school graduate 


Eighth grade or less 


Some college—PhD 


High school graduate— 
some college 


High school graduate— 
some college 


High school graduate— 
some college 


High school graduate 


High school graduate— 
some college 


Eighth grade or less— 
high school graduate 


Some high schoo]— 
high school graduate 


Eighth grade or less 


Semiskilled and 
Unskilled 
6-00 thru 9-99 


DOT except for two changes: (a) the managerial 
group was broken into high and low managerial 
groups, following the recommendations of Hollings- 
head (1957); and (b) semiskilled, unskilled, and 
service were combined into one group. In order 
to refine the socioeconomic groups, there was addi- 
tional selection within each occupational group by 
parental education that corresponded to the father’s 
occupation. This method was used to increase homo- 
geneity of cultural backgrounds of students within 
a group. Students whose parents had marked dis- 
parities in educational backgrounds, for example, 
were not included in the study. Thus, students 
whose fathers were professional men were included 
only if their fathers were college graduates and their 
mothers had some college. At the other extreme, 
students whose fathers were in service, semiskilled, 
and unskilled occupations were included only if the 
educations of both parents were eighth grade or 
less. The educational levels of the father selected 
for this study corresponded roughly to those re- 
ported by Wolfbein (1964) on the average years 
of school completed for various occupational groups. 
These same groups were the subjects of an un- 
published study by the author on differential aca- 
demic ability and achievement by social class. 

Table 1 is a description of the nine groups, in- 
cluding minimum and maximum levels of parental 
education. In the results, the groups will be desig- 
nated by their occupational titles; educational level 
will be implied. 





Some of the groups were relatively small, particu- 
larly the semiprofessional, clerical, and service, semi- — 
skilled, and unskilled. The original analysis of the 
total sample (Hewer & Neubeck, 1962) indicated 
that children of fathers who were in semiskilled and 
unskilled occupations entered the University in dis- — 
proportionately lower numbers than would be ex- 
pected from census data. A slightly smaller propor- 
tion entered from clerical and service homes. { 


Test 


SVIBs used in this study were taken by these 
college freshmen at various times, but the majority 
were completed within the span of a year. Some 
students took them during their senior year in high 
school, some during orientation just prior to en- 
trance to the University, and a few during the 
freshman year. 


Analysis 


Differences among distributions on each of 48 
SVIB scales were tested for significance using the 
analysis of variance test (Edwards, 1950). 


RESULTS 


Mean scores and standard deviations on 
each of 48 SVIB scales for the nine groups 
were determined, but only the statistics for 
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TABLE 2 


MEANS AND STANDARD DEvIATIONS OF SVIB ScALEs FoR SocrAL Groups NV = 484 








ete 1s 2 3 4 5 6 7 8 9 F (8475 
(N =66) | (N =17)| (NV =56) | (N =57)| (N =23) | (N =49)| (N =84) | (NV =98)| (NV =34) df 
I Psychologist M 28.1 Doel 20.1 21.5 21.4 24.6 18.9 22.0 24.0 4,47*% 
SD 11.8 8.1 10.9 11.6 5.5 12.1 9.0 9.8 12.3 
Physician M 34.7 32.6 26.2 27.9 25.5 32.2 27.9 29.6 32.1 3.16%** 
SD 13.7 1232 12.5 12.7 9.9 14.6 L142 11,9 t11 
Veterinarian M 24.8 23.4 26.5 25.4 26.5 Ded 30.5 25:5 yee 2.03* 
SD 11.2 10.9 10.9 10.4 10.3 11.5 11.6 8.9 12.9 
II Physicist M 18.7 19.1 14.0 15.2 15.2 17.2 20.0 19.2 19.1 2.39% 
SD 10.3 11.9 10.0 10.5 9.7 9155 8.6 11.4 11.3 
Engineer M 29.0 Sls 25.5 26.3 24.5 28.0 32.2 31.5 32.4 2.52* 
SD 13.9 13.2 12.8 12.0 13.2 14,1 2s 14.6 11.8 
Chemist M 32.6 35.2 26.8 27.0 27.4 30.2 31.6 31.7 32.0 2.56*# 
SD 12.1 10.8 11.0 12.3 10.9 13.0 9.4 12.1 10.0 
IV Farmer M 39.1 39.4 39.6 40.2 41.1 40.2 47.5 42.5 42.3 4.79%% 
SD 10.2 8.4 10.8 9.9 9.8 9.2 10.8 9.7 11.4 
Carpenter M 23.9 26.4 28,2 27,2 29.0 26.0 33.2 31.7 31.9 4.39*# 
SD 11.6 10.4 12.9 11.6 9.7 11.2 11.1 WR 1300 
Industrial M 19.4 18.7 2tel 19.5 22.8 20.4 25.0 24.1 25.8 2.08* 
arts teacher SD tics 10.3 RES 12.1 12.7 didez 13:5 13.6 1325: 
Vocational M 29.3 Dhez Sled) 31.1 34.6 32.2 36.3 32.8 34.4 2.58** 
agric. SD 12.1 9.5 12.0 11.9 We, 1225) 11.9 10.3 1251 
teacher 
Forest M 22 25.3 25.6 24.4 25.0 25.6 31.6 27.8 29.1 2.23% 
serviceman SD 11.9 8.5 14.3 12.2 13.7 10.7 11.6 11.6 13.2 
V Personnel M 31.9 25.9 29.3 26.9 28.3 27.6 23.6 26.6 Die 2.99% 
director SD 11.9 10.2 10.9 12.1 11.6 9.6 1d;1 9.8 1133 
Public M 36.8 31.9 33.1 30.1 30.6 31.6 29.6 30.5 34.3 2.70** 
admin. SD 13.7 8.9 10.4 12.1 12.8 9.9 9.9 10.4 12.0 
City school M 27.9 22.1 23.1 23ah 25.3 24.5 21.8 21.3 24.2 2.43% 
supt. SD 10.9 11.0 9.5 12.3 9.9 9.6 10.4 10.3 10.4 
Social M 31.9 26.5 28.0 26.9 27.4 27.8 21.9 25.4 28.0 3.497% 
worker SD 13.8 10.5 11.0 13.1 12 12.5 10.5 art 13.8 
IV Musician M 40.2 38.6 37.0 39.2 39.0 38.6 34.6 38.9 39.9 2.19% 
SD 10.2 5.8 10.9 9.8 10.0 10.4 9.1 9.8 9.1 
VIII Purchasing M 28.1 32.1 33.9 32.5 31.7 30.2 30.2 32.3 31.4 2.16* 
agent SD 8.2 8.3 9.7 10.0 6.1 11.4 8.6 8.3 9.2 
Banker M 26.6 29.0 S25 31.6 31.9 28.3 32.8 30.0 29.6 3.53* 
SD 9.2 9.9 7.6 10.3 7.3 9.9 7.3 8.3 7.9 
Mortician M 29.6 29.2 35.9 33.2 33.2 Sao 30.6 31.9 30.6 2.45* 
SD 9.4 8.5 10.2 10.2 8.7 9.0 8.0 8.9 9.8 
IX Sales M 31.8 31.8 35.3 Bout 33.5 33.2 28.4 31,4 29.9 3.21*# 
manager SD 9.7 7.0 9.2 10.0 7.6 9,2 8.3 10.0 9.2 
Real estate M 36,7 38.0 39.0 40.0 39.7 38.7 35.6 37.3 35.5 2.27% 
salesman SD 8.0 6.7 9.3 9.5 6.6 7.3 6.0 8.8 6.6 
Life M 31.0 29.6 33.9 32.3 31.0 31.3 26.5 29.5 28.5 3.50%* 
insurance SD 9.6 9.1 9.2 9.9 8.1 8.3 8.6 10.5 8.3 
salesman : 
X Advertising M 29.9 29.0 28.4 30.0 29.5 30.8 24.0 28.8 74 3 4,63** 
manager SD 8.0 5.7 8.7 7.6 9.7 8.0 6.7 74 8.7 
Lawyer M 29.9 27.7 27.6 27.9 27.8 29.1 26.1 26.3 27.3 2.35* 
SD 6.8 5.1 wall 6.3 6.6 Tot 6.3 6.1 7.0 
XI Interest M 52.8 50.1 50.7 49,9 49,2 48.1 49,2 49.0 50.4 2.01* 
maturity SD Tok 6.2 6.6 8.2 ie 9,2 7.0 Pie 74 


Note.—twenty-three scales with no significant mean differences not reported—Testscor profile, 1955. 
®1 Professional, 2 Semiprofessional, 3 Managerial-High, 4 Managerial-Low, 5 Clerical, 6 Sales, 7 Agriculture, 8 Skilled, 9 
\iskilled, Unskilled, Service. 
*p <.05 F.95(df 8/400) = 1.96, 
*# » < .01 F.99(df 8/400) = 2.55. 


410 


the 25 scales for which the differences were 
significant are reported in Table 2. Differ- 
ences among means among the nine groups 
were significant at only the 5% level on 13 
scales and at the 1% level on 12 additional 
scales. 

Among the 25 scales on which there were 
reliable differences among the means of the 
groups were nine scales on which the pro- 
fessional group had the highest mean score. 
These included Psychologist, Physician, Per- 
sonnel Director, Public Administrator, City 
School Superintendent, Social Worker, Mu- 
sician, Lawyer and Interest Maturity (IM). 
The professionals had the highest mean score 
on all 4 of the 8 scales in Group V on 
which there were reliable differences. The 
lowest mean scores on these same scales were 
among the agricultural or skilled groups. 
The high Interest Maturity score among the 
professionals was probably not independent 
of their elevated Group V scores. In general, 
students from professional homes had the 
highest scores, among the social groups, in 
the biological science, social welfare, and 
verbal SVIB factorial groups. They were less 
likely to be the highest group in the physi- 
cal sciences, business management, and sales 
occupations. 

Again, if only those scales were considered 
on which there were significant differences 
among group means, students whose fathers 
were farmers scored highest on occupations on 
which they might be expected to score high: 
Veterinarian, Farmer, and Vocational Agri- 
cultural Teacher. In addition, they had the 
highest mean score on Physicist, Forest Serv- 
ice Man, Carpenter, and Banker. All of these 
occupations are in the scientific or technical 
factorial group except Banker. Campbell ” 
(1964) in some unpublished research on in- 
terests of bankers on SVIB has found many 
bankers in the original criterion group were 
from small towns. Other social groups in 
which fathers were in technical occupations 
were the skilled, semiskilled, and unskilled 
groups. On none of the scales in which there 
were significant mean differences among the 
groups did the skilled trade group have the 


2 Campbell, David P. Personal Communication. 
July, 1964. 
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highest mean score. Social Group 9, the semi- 
skilled, unskilled, and service group, had the 
highest mean of the other mean scores on 
two scales, Physicist and Industrial Arts 
Teacher. The results for Group 9 must be 
interpreted with care. In comparison to other 
occupations, disproportionately small num- 
bers of sons of men in service, semiskilled, 
and unskilled occupations enter college. 

The social groups in which the fathers were 
in occupations allied to business were high 
and low level managerial, clerical, and sales. 
Students with fathers in the high level mana- 
gerial occupations had the highest mean of 
the nine social groups on several scales, Pur- 
chasing Agent, Mortician, Sales Manager, 
and Life Insurance Salesman; the low level 
managerial on one scale, Real Estate Sales- 
man; sales on one scale, Advertising Man; 
and clerical on no scales. The interests of the 
sales group tended to be more like those of 
the professionals than the managerial group. 

In general, groups of students classified by 
their social origins tended to have interests 
in occupations within the same interest factor 
or family as their fathers. It is also reasonable 
to assume that there were cultural differences 
in the homes of the groups contributing to 
diversity of interests. The professionals, for 
example, had the highest mean score of all 
the groups on the Musician key. Increased 
verbal facility is associated with superior cul- 
tural advantage. There appears to be a trend 
for students of lesser cultural advantage to” 
identify with occupations that stress quanti- 
tative and technical rather than verbal train- 
ing. For example, agriculturals and skilleds’ 
had higher scores than other groups on Physi- | 
cist, Farmer, and Carpenter; the professionals” 
on Psychologist, Physician, Lawyer, and some 
occupations of Group V. | 

Although differences among groups were 
easily detected and were statistically signifi- 
cant, as with much of the SVIB research, 
similarities among groups are as impressive 
as differences. The practical significance of 
these differences among means was explored 
through the test for overlap (Tilton, 1957). 
Overlap is presented for those scales in which 
there were the greatest differences in means 
in Table 3. Strong believed that two groups” 
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TABLE 3 


OVERLAP OF SELECTED SocrIAL Groups ON SVIB ScALES 














Percentage 
SVIB scale Social groups compared overlap 
Psychologist Professional with Agriculture 66 
Physician Professional with Agriculture 78 
Farmer Professional with Agriculture 69 
Carpenter Professional with Agriculture 68 
Carpenter Professional with Skilled 75 
Personnel director Professional with Agriculture 72 
Public administrator Professional with Agriculture 72 
Social worker Professional with Agriculture 68 
Banker Prefessional with Managerial-High 72 
at overlap 80% or less are different enough REFERENCES 


9 be considered practically different. The 
sast overlap was between the professionals 
nd agriculturals. In general, the percentage 
f overlap of two groups on any given scale 
vas high, suggesting that many factors were 
perating to make the interests of these stu- 
ents similar. Public education, television, 
he press are sources of common learning ex- 
eriences among social groups and may ex- 
lain to some degree likenesses in interests 
mong these students. 

On the other hand, in spite of these like- 
esses, stable differences in vocational in- 
erests were found among groups of college 
tudents classified by social backgrounds. 
“his suggests that a contributing factor to 
ifferences in interests is learning in a par- 
icular socioeconomic environment. To explore 
eterminants of interest other than socio- 
conomic status, two groups might be selected 
7ithin a homogeneous social group: one with 
aterests compatible with the socioeconomic 
rigins and another with interests incompati- 
ile. These two groups could be studied for 
lifferences, for example, in verbal intelligence, 
juantitative ability, spatial aptitude, identi- 
ication with mother, identification with sig- 
ificant person outside the family, rural-urban 
vackground, and parent-child interaction for 
urther insight into interest development. 
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EFFECTS OF PERCEIVED ROLE AND ROLE SUCCESS 
ON THE DETECTION OF DECEPTION? 


LAWRENCE A. GUSTAFSON? anp MARTIN T. ORNE 


Institute of the Pennsylvania Hospital and University of Pennsylvania 


75 college students participated in a detection of deception experiment de- 
signed to investigate conflicting results regarding the effect on the rate of 
detection of a preinterrogation demonstration of the polygraph’s accuracy. 
It was hypothesized that the differences were due to differential demand charac- 
teristics in the 2 experiments. The information S received between Trials I 
and II and S’s perception of his role were the major independent variables. 
If Ss received information which was consonant with their perceived roles, 
they were detected significantly less frequently than Ss who received informa- 
tion not consonant with their roles. The findings conform to the “consequences 
theory of detection” and support the hypothesized explanation of the disparate 


results. 


Most field situations involving the detec- 
tion of deception employ a preinterrogation 
procedure in which the subject (S) is asked 
to draw a card at random from a pack and 
the interrogator then proceeds to inform the 
S as to the card he has drawn. The theory 
behind this practice is that, by showing S 
that he cannot deceive the interrogator, S 
will be easier to detect in subsequent interro- 
gation. Serious doubt has been cast on this 
theory by the study reported by Ellson, Davis, 
Saltzman, and Burke (1952). In their com- 
prehensive report on the detection of decep- 
tion, they state that Ss who are successfully 
detected on one trial and are so informed 
are more difficult to detect on subsequent 
trials. 

The basic problem involved here is the 
effect which S’s knowledge of his perform- 
ance in one situation will have on his subse- 
quent performance. A S who has been in- 
formed that he successfully deceived on one 
trial will perceive the second trial quite dif- 
ferently from a S who has been told that he 
was unsuccessful. 

A study by Gustafson and Orne (1963) 
has demonstrated that increasing the Ss’ mo- 


1 This work was supported in part by the United 
States Army Research and Development Command 
Contracts DA-49-193-MD-2480 and DA-49-193-MD- 
2647. : 

2 The authors wish to express their appreciation to 
M. J. Moskowitz and J. A. Barmack for their criti- 
cal comments in the preparation of this manuscript. 
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tivation to deceive made them more easily 
detected. In the same study, three groups 
were tested with different information of re- | 
sults: one group told that they had been 
successfully detected on the first trial; the~ 
second group told that they had not been 
detected; and the third group told nothing. 
The results of this part of the study, while 
not statistically significant, strongly suggested 
that the highly motivated Ss who are in- 
formed that they have successfully deceived 
the E will be more difficult to detect on j 
subsequent trials, while those Ss who are told \ 
nothing or who are told that they had been 
detected show no change in detectability. i 
While the differences between the results 
of this study and those obtained by Ellson — 
et al. (1952) might be attributed to the high — 
motivation of the Ss in the former (the mo- 
tivation of the Ss in the Ellson et al. [1952] 
report is not known), there are other varia- f 
bles to be considered. Specifically, it is neces- i 
sary to consider not only the degree of mo- — 
tivation displayed by the Ss, but also the 
direction of motivation—the goals of the be- 
havior as perceived by the Ss. For example, 
in the study by the authors which was de- ; 
scribed above, the Ss were told that indi- — 
viduals with great emotional control and 
i 
ts 


superior intelligence were able to deceive suc- 
cessfully. These instructions proved highly 
motivating to the college population from 
which the Ss were drawn. If the S$ is told 





DETECTION OF DECEPTION 


iat he successfully deceived E on the first 
ial, he can assume that he possesses the de- 
red traits. If he is told that he was detected 
1 the first trial, he can interpret this to 
ean that he is not outstanding on these 
aits. According to the consequences theory 
i the detection of deception (Gustafson & 
rne, 1963), it would be predicted that the 
who has successfully deceived on the first 
ial would be less concerned about the second 
ial—the consequences of deceiving would 
2 less important to him, he would be less 
tivated and, therefore, more difficult to 
atect. In the other cases, the already high 
otivation-to-deceive would not likely change 
1 the second trial. 

An alternative situation can easily be im- 
ined. Without specific instructions, the S 
ight well perceive the purpose of the experi- 
ent to establish the effectiveness of lie de- 
ction. He might further assume that “a 
ormal S” can readily be detected by the 
juipment used. If the S makes these assump- 
ons and he is told that he was not detected 
1 the first trial, that he successfully deceived 
ie EF, then he would of necessity conclude 
iat his responses had been different from 
iat of the “normal S.” As a result, he would 
2 more concerned on the second trial. The 
cond trial would be more important to 
‘m, and the consequences theory of detec- 
on would predict that he would, therefore, 
2 more easily detected on the second trial. 
onversely, if the S perceives the experiment 
, the manner outlined and if the S is told 
iat he has been detected on the first trial, 
iat is, responded in the manner of “normal 
3,” the second trial would be of less im- 
ortance to him and therefore he would be 
ore difficult to detect. 

Thus, the manner in which the S perceives 
\e experimental lie detection situation would 
ad to opposite results. The S’s perception 
* the experimental situation is a variable 
hich is necessarily present in all experi- 
ental situations. Orne (1962) elsewhere has 
med the sum total of cues which deter- 
ine the S’s perception of the purpose of 
le experiment as the “demand characteris- 
cs of an experimental situation.” The type 
* cues which may be involved include ex- 
‘icit and implicit instructions, the experi- 
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mental setting, the experimental procedure 
itself, subtle behaviors of the E (as well as 
the ancillary personnel), etc., as they are 
interpreted in the light of S’s past knowledge 
and experience. Any experimental situation 
will have demand characteristics (i.e., neces- 
sarily most Ss form some hypothesis about 
what they are doing and why) which may 
be significant factors in the interpretation of 
the experimental findings. 

Subtle differences in demand characteristics 
could have existed between the experiment of 
Ellson et al. (1952) and our original pilot 
studies. In these pilot studies, Ss were moti- 
vated to deceive, whereas in the Ellson et al. 
studies no specific preexperimental instruc- 
tions were given. However, many Ss come to 
the experiment with the conviction that nor- 
mal individuals cannot fool the lie detector 
and that only individuals who lack strong 
conscience and who have some kind of crimi- 
nal tendencies are able to lie successfully. 
It seems plausible that these preconceptions 
were present in the population from which the 
Ellson et al. sample was drawn. Unless spe- 
cific instructions are given to the contrary, 
this type of preconception might well cause 
the S to perceive the experiment in the al- 
ternative way discussed above. It should be 
clear that problems of experimenter bias are 
not being considered here, rather the pos- 
sibility that Ss in the two experiments per- 
ceived the purposes in two alternative fash- 
ions. In the light of the consequences theory 
of deception, the interaction between these 
different sets of Ss’ perceptions (demand 
characteristics) with the experimental variable 
would lead to opposite results. 

The present experiment was designed to 
test whether differences in the S’s perceptions 
explicitly created by instructions similar to 
those which could have been introduced by 
implicit differences in the experimental situa- 
tion would yield opposite findings even 
though otherwise the identical set of experi- 
mental variables were employed. The major 
independent variables were: (a) the S’s per- 
ception of his role in the experiment, and 
(b) the information he received as to his 
success on the first trial. The primary de- 
pendent variable was his detectability in the 
second trial of the experiment. In keeping 
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with a widely used method for categorizing 
the direction of human motivation, one group 
of Ss was designated as “need deceive” (n 
Deceive) and the second group as “need 
detected” (n Detected) .® 

It was predicted that: 


1. Ss who perceive their task to be the 
successful deception of Z (n Deceive) and are 
told that they have indeed been successful 
in deception will be detected less frequently 
on the next trial than Ss with the same per- 
ception who are told that they have been 
detected. 

2. Ss who perceive their task to be that 
of being detected by E (n Detected) and 
are told that they were indeed detected on 
the first trial will be detected less frequently 
on the next trial than Ss with the same per- 
ception who are told that they had not been 
detected. 

3. Ss in the n Deceive and the n Detected 
groups who are given the same information 
as to whether they were detected or not will 
be detected on the next trial at significantly 
different rates. 


MeEtTHOD 


Subjects. Seventy-five male undergraduates were 
recruited from local school employment offices and 
paid for their participation. Many Ss had _ partici- 
pated in psychological experiments, but none had 
previously taken part in detection of deception ex- 
periments. 

Procedures. The Ss were randomly divided into 
two groups: n Deceive and n Detected. Each major 
group was divided further into two subgroups ac- 
cording to the information given the S between 
Trial I and Trial II. In subgroup one, the S was 
informed that he had successfully deceived E, while 
in the second subgroup the S was told that he had 
been detected. 

A tape recording was played to each S at the 
beginning of the experiment. The tape included in- 
formation designed either to motivate S to deceive 
or to motivate him to be detected, according to 
which of the two major groups S belonged, and 
procedural instructions which were to be followed 
“for the duration of the experiment.” These in- 
structions were identical for both the n Deceive and 
n Detected groups and were the same as those used 
in an earlier study (Gustafson & Orne, in press). 
Those Ss in the two n Detected groups heard the 
following: 


3'These do not refer to components of Murray’s 
theoretical system. 


Thank you for participating in our research. 
Just a few words about the experiment in which 
you are about to take part. This study is de- 
signed to see whether or not you can withhold 
information from equipment which is based on a 
principle similar to that of the lie detector. Of 
course, the equipment which we are using is a 
good deal more sophisticated than that which is 
usually used as a lie detector device. I am sure 
that you know that these devices have not yet 
been fully recognized by the courts, primarily be- 
cause the circumstances under which they work 
and the mechanisms which make them work are 
not yet fully understood. However, certain facts 
are known based in good part on our own re- 
search as well as that of others. We know that 
individuals who are normal and _ well-adjusted 
find it extremely difficult and even impossible to 
prevent themselves from giving certain physio- 
logical reactions when they lie. This is largely 
due, or so it seems, to certain childhood experi- 
ences which seem to have caused a kind of con- 
ditioned involuntary autonomic response associ- 
ated with lying. The machine does nothing more 
than record this response making it possible to 
recognize a lie. Of course this only works with 
normal individuals; individuals who have so- 
called psychopathic tendencies, who are able to lie 
without any feeling of guilt, or who are mentally 
disturbed, do not appear to show these kinds of 
changes associated with lying. 

In this study we are interested in your at- 
tempting to prevent yourself from showing the 
response to lying. We realize that this is extremely 
difficult for normal individuals, however, we want 
you to try. Presently the rules of the experiment 
will be explained to you. You are to follow them 
as carefully as you can. Your job is to try to’ 
show no response whenever you can. 

Good luck! 


The two n Deceive groups listened to a tape which 


was similar to the one used in our preliminary study — 


cited in the introduction (Gustafson & Orne, 1963): 


We would like to thank you for participating 
in our research. Just a few words about the ex- ‘ 
periment in which you are about to take part. 
This study is designed to see whether or not you 
can withhold information from equipment which — 
is based on a principle similar to the lie detector. 
Of course the equipment we are using is a good 
deal more sophisticated than the usual kind of — 
lie detection device. I am sure that you know 
that these devices have not yet been fully recog-— 
nized by the courts largely because there is some 
question about their validity. These tests are de-— 
signed to see whether or not an individual who 
really wants to is able to withhold information 
from the machine; in other words, when certain q 
significant things are said whether you can sup- | 
press your autonomic, your involuntary bodily re- | 
action to them. Some few people are able to do 
this. However, I should tell you that it is rather 
difficult. We have found that these are individuals 
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who have more than the usual amount of con- 
trol, who are quite superior in intelligence. 

I should tell you that in this experiment you 
should try your level best to withhold informa- 
tion from the machine. As I say, while it is very 
difficult to do this, it is possible. Now the rules 
of the experiment will be explained to you. 

Good luck! 


Transducers for recording skin resistance, heart 
te, and respiration were then attached. The 
ethod for recording skin resistance was that used 
previously reported research (Gustafson & Orne, 
63). (The other measures were included only for 
ploratory purposes and will not be discussed 
rther.) 

Each S drew a card from a seven-card deck which 
nsisted of two blank cards and five cards, each of 
nich had a different two-digit number printed on 
The decks had been arranged in such a manner 
at # could identify the card which S selected by 
| position in the deck. Although this procedure is 
t usual in detection of deception studies, it was 
llowed because it was necessary for E to be able 
present S with the information appropriate to his 
bgroup without regard to his performance on 
malol, 

Because only Ss who drew number cards on both 
als could be used in testing the hypotheses, the 
ink cards were arranged so that they were always 
either the first or second position at the front 
d the back of the deck. Observations from earlier 
idies indicate that Ss draw cards from near the 
ddle of the deck much more frequently than they 
from the extremes. Thus, only a few Ss were lost 
cause they drew blank cards. 

Each S was instructed to write the number on 
other card 10 times while E was out of the room 
d to write zeros if he drew a blank card. This 
s done to insure that S had actually looked at 
> card. Both cards were turned over by the S 
fore the £ returned. Prior to each trial, the E 
ninded S$ to respond verbally with “no” to each 
mber presented to him during the interrogation. 
The deception task was based upon what the 
ssent authors have designated as the “guilty-per- 
1 paradigm” (Gustafson & Orne, in press). In this 
sign, S’s task is to appear as though he has drawn 
olank card and, therefore, has no “special” infor- 
‘tion concerning any of the numbers. This model 
contrasted to the “guilty-information paradigm” 
which S may deceive E by forcing a response to 
1oncritical item (Gustafson & Orne, in press). The 
k of all Ss in this experiment, therefore, was to 
ypress responses to all numbers and thus try to 
dear innocent. Of course, in the case where S had 
vwn a blank card, there was no deception in- 
ved. 

Che numbers were presented in a manner similar 
the relevant-irrelevant method used in commer- 
| lie detection (Inbau & Reid, 1953; Lee, 1953). 
e five numbers were presented in random order 
S by a tape-recorded voice, one number every 15 
onds. The first number on the tape was a dummy 
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so that the inordinately large GSR response which 
usually appears on the first stimulus presentation 
could be discarded. Recordings were made on an 
Offner Type R Dynograph at a paper speed of 2.5 
millimeters per second. After a stable resting GSR 
had been obtained, the tape-recorded interrogation 
was played to S. 

At the conclusion of Trial I, E re-entered the 
subject room and presented the S$ with the infor- 
mation appropriate to his subgroup. One-half of 
the Ss were told that they must have drawn blank 
cards (though they had not) and thus were made to 
feel that they had successfully deceived E, while the 
other half were informed of the number and were 
asked to verify that E had, in fact, correctly detected 
them. 

The S was then asked to draw a card from a sec- 
ond deck. The procedure followed during Trial II 
was identical to Trial I except the interrogation 
tape was made to correspond to the numbers in 
the second deck. These numbers did not duplicate 
any of the numbers in the first deck. 

Analysis of the data. The analysis was performed 
by individuals who did not know what the critical 
numbers were and who were unaware of the au- 
thors’ hypotheses concerning the experiment. Of the 
75 Ss who took part in the experiment, 11 were 
discarded because, by chance, they happened to 
have chosen a blank card on either Trial I or Trial 
II, or because they did not follow instructions. 

The average GSR response to each of the five 
numbers was determined. The largest mean response 
was ranked as one, the second largest as two, etc. 
If the number which S$ drew was given a rank of 
one, S was considered to have been detected. If the 
rank assigned was not one, S was considered to 
have deceived EF. Chi-square tests were used to com- 
pare the number of successful and unsuccessful de- 
tections between different conditions. Trial I and 
Trial II were treated individually in the analysis. 


RESULTS 


While the outcome of Trial I was not a 
principal concern of this study, it was im- 
portant to determine whether there were any 
systematic differences between the subgroups 
prior to the introduction of the treatments 
presented at the conclusion of Trial I. The 
number of successful and unsuccessful de- 
tections on Trial I for the n Deceive and n 
Detected groups are presented in Table 1 
according to the kind of information S re- 
ceived between Trials I and II. There were 
no significant differences. 

Prediction 1. The Ss of the n Deceive group 
who were told that they had successfully de- 
ceived E (on Trial I) were detected signifi- 
cantly less often on Trial II than n Deceive 
Ss who were told they had been detected. 
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TABLE 1 


NUMBER OF SUCCESSFUL AND UNsuccessruL Derecrions ON TRIAL [ ¥or Tuk Two SUBGROUPS 
OF THE N DETECTED AND N DECEIVE GROUPS 








Group 


Told detected 


x? between 


Told not detected Columns 1 and 2 








“Need to be Detected” 


Detected 9 
Not detected if 
“Need to Deceive” 
Detected 13 
Not detected 3 
x? between n Detected xe Sh 


and n Deceive groups ns 


13 x? = 1.31 
3 ns 
11 x? = 0.17 
5 ns 
x? = 0.17 


ns 





Note.—Note that S's were not given information about the success of detection until after the trial on which these data are based. 
A multiple chi-square contingency analysis (Sutcliffe, 1957) was used to analyze the departures from expected frequencies in 
the entire Table. Neither the chi-square components for each variable alone, nor the interaction between variables, were significant. 


Prediction 2, The Ss of the n Detected 
group who were told that they were unsuc- 
cessful in deceiving E were detected signifi- 
cantly less often on Trial II than n Detected 
Ss who were told they had not been detected 
(see Table 2). 

Prediction 3. The Ss in the n Deceive and 
n Detected groups who were given the same 
information with regard to their success or 
failure at deceiving on Trial I were detected 
at significantly different rates on Trial II 
(see Table 2). 

On Trial II, it is interesting to note that 
the Ss in the two groups with low-detection 
rates showed markedly different overall re- 


sponse patterns, Often a very flat GSR rec- 
ord was obtained. Frequently Ss responded 
only to the first number (which was a dummy 
and not included in the analysis), This led 
to a number of ties when the average re- 
sponses were ranked. The GSR records were 
similar to those of the unmotivated group in 
a study reported earlier (Gustafson & Orne, 
1963). 


Discussion 


The results support the hypothesis that 
the demand characteristics (S’s perception of 
the purpose of the experiment) significantly 
affect the rate of detection. The principal 


TABLE 2 


NuMBER OF SUCCESSFUL AND UNSUCCESSFUL DETECTIONS ON TrtAt II ror tae Two 
SUBGROUPS OF THE N DETECTED AND N DECEIVE GROUPS 





Group 


Told detected 





x? between 


Told not detected Columns 1 and 2 





“Need to be Detected” 


Detected 4 
Not detected 12 
“Need to Deceive” 
Detected 15 
Not detected 1 
x? between n Detected x? = 12.96 


and n Deceive groups p < .001 


14 x? = 10.28 

2 p < .005 

3 x? = 15.36 } 
eet p < .001 . 
x? = 12.55 
p < .001 


we ee 

Note.—A multiple chi-square contingency analysis here shows that neither information given, nor motivation (m Detect versus 
n Deceive) have significant effects by themselves. The relevant chi-square values, calculated from partitioned subtables, are 0.25 
(p >.95) and 0.00, respectively (df = 1). However, successful detection does depend significantly on the interaction between 


information and motivation, (x? = 30.94; » <.001; df = 1) 


; 
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inding is that subtle motivational variables 
ffect what has often been assumed to be a 
elatively mechanical procedure. When S per- 
eives his role to be one in which he will be 
letected and yet is told that he has success- 
ully deceived EZ, he becomes relatively easy 
O detect. Similarly, if S is motivated to de- 
eive and yet is told that he has been de- 
ected, he also becomes relatively easy to 
letect. More generally, if S is given informa- 
ion indicating to him that he is not per- 
orming in a manner consonant with his role, 
ie is more easily detected than if he is given 
aformation indicating that he is behaving in 
ccordance with his role. 

The results indicate that the differences 
yetween detection rates on Trial II are not 
ue to differences in Ss prior to the experi- 
aental manipulation which occurred between 
“rials I and II. It should be remembered, 
owever, that E knew the composition of the 
roups beforehand and could conceivably 
‘ave communicated cues to S during or after 
he experimental treatment. Indeed, one may 
e legitimately concerned with the problems 
f bias in these results if, as has been demon- 
trated, subtle factors affect the rate of de- 
ection. 

Previous research (Rosenthal, 1963) has 
idicated that E is by no means neutral to 
ne outcome of his study. Only one E (LAG) 
ook part in the running of Ss. Because of his 
aeoretical frame of reference, he had a con- 
derable investment in the attainment of the 
xpected outcome of the study. It is possible, 
nerefore, that in his interaction with the S, Z 
abtly communicated his expectations. In the 
ght of the consequences theory of decep- 
‘on, however, any S who became aware of 
’s wish that he not be detected should, in 
fect, be detected more easily. In other words, 
ny subtle communication of E’s expectations 
> S might well have militated against the 
ypothesis. Furthermore, in a previous ex- 
eriment the same £ had confidently expected 
» replicate the findings of Ellson et al. 
1952) and was surprised when contrary re- 
ults were obtained. The findings may thus 
® ascribed to the differential affect in moti- 
ation introduced by the tape recordings 
ither than to artifact or bias introduced by 
. Nonetheless it would seem desirable to 


417 


replicate these findings with E unaware as 
to which group S belongs. 

These results support the view that the 
detection of deception is a subtle phenomenon 
in which psychological variables play a cru- 
cial role. The study appears particularly rele- 
vant since the conclusion which might have 
been drawn on the basis of the work of Ell- 
son et al. (1952) would be that the standard 
technique as used in the field should be re- 
vised, that is, that the proof given to S of 
the polygraph’s effectiveness would decrease 
rather than increase its subsequent utility. 
This conclusion is not justified, however, 
since the psychological parameters of the field 
situation correspond to the n Deceive group. 
In this situation, proof of successful detec- 
tion maximizes subsequent detection. 

It is assumed that the study conducted by 
Ellson et al., where motivation is not speci- 
fied, corresponds to the n Detected group, 
suggesting that the psychological setting in 
which an experiment is performed may inter- 
act in a crucial fashion with experimental 
variables. 
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The development of a nationwide fallout shelter system has initiated research 
on the physiological, psychological, and sociological aspects of group isolation. 
The most austere occupancy tests have been conducted at the University of 
Georgia. Results indicate that healthy men, women, and children can endure 
2 wks.’ isolated confinement under conditions of severe austerity without 
suffering deleterious physiological or psychological effects. 


In the summer of 1962 a realistic program 
for national civil defense was well underway. 
Emphasis in the program was on the develop- 
ment of a nationwide fallout shelter system. 

A major objective of the National Shelter 
Program was to locate and mark suitable 
fallout shelter spaces in existing structures, 
and then to stock them with food, water, 
medical kits, sanitation kits, and radiation 
measuring instruments. More information 
was needed on the problems people might 
encounter in living under austere conditions 
in public fallout shelters. To obtain this in- 
formation, the University of Georgia Psycho- 
logical Laboratories conducted a series of 
shelter occupancy tests for the Department 
of Defense, Office of Civil Defense. 

The general research mission was to ap- 
praise minimal survival conditions in public 
fallout shelters as presently equipped and 
stocked with emergency supplies. Specifically, 
the project was to evaluate the interactive 
effects of such variables as overloading, lim- 
ited bunks and bedding, emergency sanita- 
tion equipment, marginal ventilation condi- 
tions, and minimal food and water supplies. 
A film depicting results was also part of the 
research mission. 

Previous fallout shelter occupancy research 
(Altman, 1960; Strope, 1960, 1961; United 
States Naval Research Laboratory, 1962; 
Vernon, 1959) involved the use of healthy 
men in the military service, or civilians under 
conditions of relative comfort. The studies 
conducted at the University of Georgia, in 


1 This research was performed under Office of Civil 
Defense, Department of Defense Contract No. 
OCD-OS-62-226. Principal investigators were J. A. 
Hammes and R. T. Osborne. 
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contrast, surpassed in austerity all previous 
shelter research using civilians. 


EXPERIMENTAL STUDIES 


An outline of the four experimental studies 
is presented in Tables 1, 2, and 3. A 4-day 
study (Experimental Study I), two 2-week 
studies (Experimental Studies II and III), 
and a 1-week study (Experimental Study IV) 
were conducted. 

Experimental Study I, the 4-day study, 
was the most austere. The shelterers subsisted 
on fewer calories (315 calories per person per 
day) and slept on a concrete floor. The eight 
defections occurring during Experimental 
Study I indicated that less austere conditions 
might persuade a greater number of shelterers 
to endure the longer stay of the other planned 
studies. 

Experimental Studies II and III, therefore, 
provided %4¢-inch thickness of corrugated 
fiberboard as floor covering and additional 
food and water. The age ranges were also 
extended in these studies (7-67 years). 

Experimental Study IV was designed as 
a 1-week elementary school occupancy study. 
This study was the first of its kind conducted 
in the United States. Two adults, a shelter 
manager and a nurse, accompanied 28 chil- 
dren. The Office of Civil Defense (OCD) 
carbohydrate supplement was also tested in 
this study. Space was reduced to 6 square 
feet per occupant, to evaluate greater over- 
crowding, and in consideration of smaller 
body size. 

Shelter supplies evaluated in the four stud- 
ies were standard OCD issue as presently 
stocked. The four types of food ration, viz., 
the Bulgur wheat wafer, the Nabisco wheat- 


SURVIVAL RESEARCH 


419 


TABLE 1 


EXPERIMENTAL STUDY CHARACTERISTICS 











Shelterer 
Experimental 
study Duration N Sex Age Defection 
I 4 days 30 Men, women, children 15-50 8 
II 2 weeks 30 Men, women, children 9-67 5 
III 2 weeks 30 Men, women, children 7-06 D) 
IV 1 week 30 Children, two adults 7-12 11 


flour biscuit, the Nebraska wheat-corn-flour 
cracker, and the carbohydrate supplement 
were also investigated. Other supplies studied 
included sanitation kits, medical kits, and 
various kinds of commode chemicals. Mini- 
mal living space and limited conditions of 
ventilation were investigated, as well as in- 
shelter activity programs, and psychological 
and sociological patterns of behavior. 


RESULTS 


Water. In all four studies there were minor 
complaints about the taste of the water, due 
to the iodine purification tablets. However, 
the taste seemed to “improve” as time pro- 





gressed, presumably because of adaptation. 
In both studies using the Bulgur wafer (Ex- 
perimental Studies I and II), there seemed 
to be more water required to reduce thirst 
than in studies using the Nabisco biscuit or 
Nebraska cracker. Water consumption data 
may be found in Table 3. 

Food. The Bulgur wheat wafer provided 
the survival ration in Experimental Studies 
I and II. The Nabisco wheat-flour biscuit was 
used in Experimental Study III, and the 
Nebraska wheat-corn-flour cracker in Experi- 
mental Study IV. No adjuncts were supplied 
in Experimental Studies I, II, or III. How- 
ever, in Experimental Study IV, the newly 


TABLE 2 


SHELTER ENVIRONMENT 





Net space per 











person 
Ventilation cubic 
Experimental Square Cubic feet per minute Mean Range 
study feet feet per person ETS ET 
T 8 52 15 (20% 78° 74-80° 
fresh air) 
Il 8 52 Day: 40 (20% ee 69-83° 
fresh air) 
Night: 15 
(20% fresh air) 
Til 8 52 Day: 40 (20% 74° 70-79° 
fresh air) 
Night: 15 
(20% fresh air) 
IV 6 39 Day: 40 (20% ide 73-76° 
fresh air) 
Night: 15 


(20% fresh air) 


« Effective Temperature, Fahrenheit. 














4.20 Joun A. Hammes anp R. Travis OsBORNE 
TABLE 3 
SHELTER SUPPLIES 
Water Food 
quart calories 
per per 
person person 
Experi- perday per day Recrea- 
mental con- con- Bunks Bath tional 
study sumed®  sumed® Sanitation blankets water Coffee Cigarettes supplies 
i 1.3 315 Chemical toilet None None None None None 
II 1.4 787 Chemical toilet None None None 1 pk. None 
Ill 1.0 814 Chemical toilet None None None 1 pk, None 
IV 1.0 848 Chemical toilet None None None 1 pk, Paper 
(adults) and 
pencils 





® The subjects requested to consume as few rations as possible. 


prescribed carbohydrate supplement was 
added to the cracker diet. Consumption data 
are presented in Table 3. 

A nausea reaction was present in all four 
studies. This condition probably should not 
be attributed to the food alone, nor the water, 
but perhaps to a complex of environmental 
variables involving adjustment to a new en- 
vironment heretofore unexperienced. Lack of 
sleep, cramped quarters, new social adjust- 
ments—these and other factors could con- 
tribute to disturbed gastric processes. 

A primary complaint about food was the 
lack of variety. A complaint specific to the 
Bulgur wafer was the presence of colonic 
flatus. 

In the overall complaint picture, however, 
food was placed relatively far down the list. 
In terms of survival, the OCD food rations 
appeared quite adequate. The carbohydrate 
supplement in Experimental Study IV was 
especially welcomed by the children. 

Sanitation kit. A primary complaint in 
Experimental Studies I, II, and III was the 
ineffectiveness of the commode chemical de- 
odorant, Weladyne-F53. In Experimental 
Studies III and IV a series of other commode 
chemicals were tested and sodium nitrate was 
found to be satisfactory as an odor neu- 
tralizer. 

Medical kit. The most consumed item in 
the medical kit was aspirin. Items suggested 
by inshelter medics and shelterers for possible 
additional inclusion were: an intravenous or 
intramuscular sedation for hysteria, a stimu- 
lant for depression, an antinausea medica- 
tion, Band-Aids for simple dressings, radia- 


tion burn medication, (in addition to the 
petroleum jelly and sodium bicarbonate), 
and antibiotics. 

Sleep conditions. Shelterers of Experimental 
Study I slept 4 nights on a concrete floor. 
Because of primary complaints of hardness 
of the floor and bodily aches, a %4,-inch cor- 
rugated fiberboard pallet was introduced in 
Experimental Studies II, III, and IV. The 
complaints continued, but the  shelterers 
nevertheless endured .2-week confinement. It 
would seem, therefore, that bunks are not 
an absolute necessity. ; 

Shelterers complained also of room tem- 
perature changes, and at times individual 
complaints were contradictory, that is, what 
was too hot for one person was too cold for 
another. Blankets were suggested as a substi- 
tute item for corrugated fiberboard, providing 
both a sleeping surface and a solution for 
individual reactions to room temperature. 

Recreational supplies. No recreational sup- 
plies were intended to be included in Experi- 
mental Studies I, II, and III, although chil- 
dren’s literature texts, necessary for schooling 
purposes, were used as recreational items. 
No books were allowed in Experimental Study 
IV, conducted during the summer, but paper 
and pencils were purposely included for rec- 
reational use. The shelterers in all studies 
usually improvised playing cards, bingo cards, 
and checkerboards from the cardboard pallets, 
and used the wall for sketching. However, 
they suggested the following items to be in- 
cluded as part of standard shelter supplies: 
a Bible, song book, cards, checkerboards, an 
exercise manual, paper, pencils, and a few 
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xood books to read. For cleaning the floor, 
1 small broom was suggested. 


CONCLUSIONS 


Under the conditions of the experimental 
lesign, the following conclusions are indicated: 

Space, Eight square feet per person, ex- 
lusive of storage, although uncomfortable, 
would appear to be adequate for the com- 
nunity fallout shelter, 6 square feet per per- 
son for children in the elementary school 
allout shelter. These conclusions are re- 
stricted to optimal temperature and adequate 
ventilation conditions. 

Water. Under optimal temperature condi- 
jons, 1 quart per person per day of water 
s adequate for drinking purposes with the 
Nabisco wheat-flour biscuit and the Nebraska 
wheat-corn-flour cracker, when no other liquid 
idjuncts are provided. For the Bulgur wheat 
wafer, 1.5 quart per person per day seems 
0 be adequate. 

Food. Under optimal temperature condi- 
ions, it appears that 814 calories per person 
yer day of OCD survival rations without 
idjuncts are adequate in maintaining good 
»hysiological condition over a 2-week period. 

Sleeping conditions. In the present studies, 
4¢-inch thick corrugated fiberboard pallets 
movided an uncomfortable but adequate 
leeping surface. To remove sleep conditions 
‘§ a primary discomfort, however, blankets 
ould be used. Blankets would have addi- 
ional advantages of protection against cold 
emperature conditions, for example, winter 
sccupancy. There seems to be no need for 
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bunks, unless utilization of vertical space for 
sleeping is desired. 

Sanitation. The commode chemical in the 
present OCD Sanitation Kit is inadequate in 
removing commode odor as a primary com- 
plaint, but is satisfactory when sodium nitrate 
is added. 

Medical kit. Since subjects were free to 
leave on the basis of medical complaints, the 
adequacy of the medical kit could not be 
fully evaluated. The use of initially healthy 
shelterers also minimized the use of the medi- 
cal kit. However, shelterers were asked for 
suggestions, resulting in the items already 
discussed. 

Recreational supplies. These are not essen- 
tial for survival, and can be improvised when 
needed. However, shelterers did suggest help- 
ful items, already discussed. 
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An experimental (reversal) group took both the MMPI and a reversed form 
of the MMPI. A control (reliability) group took the MMPI twice. All tests 
were scored on 67 scales, with keying reversed for the reversed MMPI. Only 
trivial proportions of the response variance were found to be attributable to 
acquiescence for any of the uniformly keyed scales employed. Included were 
the scales most often suggested as measures of acquiescence: A, R, B, Bn, Rb, 
Acq, At, Dy-3, Deviant True, Deviant False, and Total True. Conclusion: 
acquiescence is an unimportant determinant of MMPI responses, including 
responses to items on “acquiescence” scales. 


The existence of acquiescence response 
style is today taken for granted. Recent stud- 
ies have attempted to find its behavioral cor- 
relates and estimate its contribution to the 
total response variance of various personality, 
attitude, and interest inventories (e.g., Chris- 
tie & Lindauer, 1963; Jackson & Messick, 
1958; Jackson & Messick, 1961; Loevinger, 
1959; McGee, 1962b; Messick, 1961a; Mes- 
sick & Jackson, 1961; Wiggins, 1962). Find- 
ings to date have been interpreted as indi- 
cating that, although acquiescence response 
style has no known behavioral, that is, non- 
test, correlates (e.g., McGee, 1962b), it, nev- 
ertheless, accounts for a large proportion of 
test response variance. Four principal meth- 
ods have been employed in arriving at this 
latter conclusion: 

Acquiescence scales. The rationale of this 
method is analogous to that underlying the 
use of the MMPI K scale. A respondent’s 
score on an “acquiescence” scale is used to 
partial the acquiescence component out of 
other scales to which he has responded (e.g., 
Webster, 1958; Webster, 1959). The problem 
lies in finding an acceptable acquiescence 
scale. If the acquiescence scale is composed 
of verbal items, then the acquiescence score 
itself inevitably confounds stylistic and con- 
tent components. If, on the other hand, the 
scale is relatively content free, all the evi- 
dence to date indicates that it will correlate 


1 This study was supported by National Science 
Foundation Grant G-25123 to Oregon Research In- 
stitute under the direction of the second author. 
Much of the data analysis was carried out through 
the facilities of Western Data Processing Center at 
the University of California at Los Angeles. 
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neither with verbal acquiescence scales nor 
other nonverbal scales which might be pro- 
posed as alternative acquiescence measures 
(e.g., McGee, 1962b; Rorer, 1965). 

Factor analyses. Investigators attempting 
to estimate stylistic components of variance 
on the basis of factor analytic results have 
tended to focus their attention on the MMPI 
(e.g., Jackson & Messick, 1961; Jackson & 
Messick, 1962; Messick & Jackson, 1961; 
Wiggins, 1962), though other inventories 
have also been used (e.g., Messick, 1962). 
If a factor could be clearly attributed to 
stylistic response tendencies, then the load- 
ings of the various scales on that factor could 
be used to estimate the proportions of their 
variance attributable to style. However, the 
decision to relate a factor to a response style 
rather than other personality variables is 
necessarily an arbitrary one. The decision 
would be justified if a stylistic marker varia- 
ble could be shown to have a high loading on 
the factor, but, as has been pointed out, no 
such scale exists today. While the results of 
factor analytic studies to date may be de- 
scribed as consistent with response style in- 
terpretations, they in no way provide evi- 
dence for the existence of such response styles. 

Direct statistical estimation, Helmstadter 
(1957) has suggested a number of methods 
by which separate set and content scores may 
be estimated for any scale for which the cor- 
rect answers to the items in the scale are 
known. While his models are applicable only 
to aptitude or achievement tests, they have 
been applied in other situations, apparently 
on the basis of the assumption that item key- 
ing on personality, interest, or attitude in- 
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entories is the same as item keying on apti- 
ude or achievement tests (Clayton & Jack- 
on, 1961; Fredericksen & Messick, 1959: 
fessick, 1961b; Messick, 1962). Such is ob- 
iously not the case. On aptitude and achieve- 
vent tests a correct answer exists for all 
idividuals, and that answer is known to the 
est administrator. On psychological inven- 
ries this condition does not hold. Even if 
n individual were, say, a pure “psychas- 
1enic,” it would not be expected that he 
ould respond in the keyed direction to all 
f the items on the MMPI Pé scale. The 
eyed alternatives have only a probabilistic 
‘lationship to even an ideal criterion indi- 
idual. It can never be known if a particular 
‘sponse is attributable to content or style. 
dmittedly, the same situation holds to some 
ctent in the case of achievement examina- 
ons, where erroneous responses may be at- 
ibutable to positive misinformation rather 
ian stylistic tendencies. However, if it is 
ssumed that a student’s misinformation is 
domly distributed with respect to item 
»ying, then estimates of his stylistic tend- 
icies may still be obtainable. The point to 
» made is that, while such procedures may 
_may not be applicable to examinations, 
‘ey are most certainly not applicable to psy- 
.ological inventories. 

Item reversals. If an item and its logical 
mtradictory are either both endorsed or 
‘th rejected by a respondent, then that re- 
ondent has given logically inconsistent re- 
‘onses to the item content, and it is inferred 
at this lack of consistency may be attrib- 
ed to stylistic response tendencies. Chap- 
an and Bock (1958) developed a model to 
ovide an estimate of the proportion of re- 
mse variance attributable to acquiescence 
‘d content, respectively, for any scale for 
rich adequate item reversals are available. 
iey applied their model to all studies in 
uich both the California F Scale and a re- 
rsed F Scale had been administered to the 
me group of subjects. Their results were 
‘erpreted as indicating that in no case was 
quiescence unimportant as compared to con- 
it, and that in some cases acquiescence ap- 
ared to be even more important than con- 
it in accounting for F Scale responses. 

In summary, of the four techniques which 
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have been utilized, only the last, that based 
on item reversals, provides an estimate of 
stylistic variance which can be defended on 
logical grounds, and that only if the assump- 
tion of adequate item reversals is met. Rorer 
(1965) has challenged Chapman and Bock’s 
results on the grounds that inadequate item 
reversals were utilized in the studies to which 
they applied their model. If Rorer is correct, 
there is no valid estimate of the importance 
of acquiescence in the literature today. 
Therefore, the present study utilized the 
reversed-content design and an extension of 
the Chapman and Bock model to estimate 
the contribution of acquiescence response 
style to response variance on the MMPI. 


METHOD 


An experimental (reversal) group composed of 96 
male and 125 female sophomores, juniors, and 
seniors from four psychology classes at the Uni- 
versity of Minnesota was given the original MMPI 
and a reversed form of the MMPI? 2 weeks apart. 
A control (reliability) group composed of 95 male 
and 108 female students from an introductory psy- 
chology class at the University of Oregon was given 
the regular MMPI twice under similar conditions. 

All inventories were scored on 67 scales. Keying 
was reversed for the reversed MMPI. For the con- 
trol group, correlations between scores on adminis- 
trations one and two provided standard test-retest 
reliability coefficients for each of the scales. For the 
experimental group, correlations between scores on 
the original and reversed forms provided estimates 
of the extent to which individuals responded con- 
sistently to the item content on the two adminis- 
trations. If the reversed form is equivalent to the 
original, and if the Ss responded consistently to 
item content (ie., if stylistic variables are unim- 
portant), then the correlations should, in the long 
run, be identical for the two groups. If acquiescence 
response style is important, then the correlations for 
the experimental group should be lower than those 
for the control group. 

However, there are two factors which should 
tend to decrease the test-retest coefficient for the 
reversal group in relation to the reliability group, 
thereby artifactually inflating the estimates of the 
variance attributable to acquiescence. First, the item 
reversals are not perfect (Rorer & Goldberg, in 


2 A copy of the reversed form of the MMPI may 
be obtained from the authors without charge or 
from the American Documentation Institute. Order 
Document No. 8502 from ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress, Washington, D. C. 20540. Remit in ad- 
vance $2.00 for microfilm or $3.75 for photocopies 
and make checks payable to: Chief, Photoduplica- 
tion Service, Library of Congress. 
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TABLE 1 


Trst-RETEST CORRELATIONS FOR STANDARD MMPI 
SCALES FOR RELIABILITY AND REVERSAL GROUPS 











Male Female 
Relia- Re- Relia- Re- 
bility versal bility versal 
group group  Differ- group group Differ- 
Scale N=95 N=96 ence N=108 N=125_ ence 
L, 79 63 16* 74 65 09 
F 59 64 —05 79 61 18** 
K 86 eZ 14** 77 78 —01 
HAs 79 71 08 75 76 —01 
D 81 75 06 77 76 o1 
Hy 70 59 11 73 70 03 
Pd 74 73 01 66 67 —01 
Mf 79 79 00 79 82 —03 
Pa 55 59 —04 72 48 24** 
ae 88 78 10* 85 87 —02 
sya 76 75 01 87 80 07 
Ma 76 65 11 80 66 14* 
St 91 84 O7* 89 88 O01 





Note.—Decimal points omitted. 
*p < .05. 
*k D < 01. 


press). Second, the test-retest reliability of the re- 
versed form is almost certainly lower than that of 
the original MMPI. This inference is derived from 
the facts that (a) the reversed items are longer and 


more confusing than the original items, and (bd). 


responses to the 16 repeated items are less con- 
sistent on the reversed form than on the original 
MMPI. 


RESULTS 


The test-retest correlations for the 13 
standard MMPI scales are presented in Ta- 
ble 1. If acquiescence has any effect whatso- 
ever, it would be expected that the scale reli- 
abilities would exceed the original-reversal 
correlations; therefore, difference scores have 
been obtained by subtracting the latter val- 
ues from the former. Obviously, if the forms 
were equivalent, obtained differences would 
all be attributable to random error and, on 
the average, as many would be negative as 
would be positive. For no scale was there a 
statistically significant (p < .05) difference 
between the reliability and reversal coeffi- 
cients for both males and females. Further- 
more, 4 of the 7 statistically significant dif- 
ferences were matched by a negative differ- 
ence in the other sex group. Overall, 7 of the 
26 differences were negative; that is, in 7 
cases out of 26 the correlations between the 
original and reversed scales exceeded the test- 
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retest correlations for the original MMP! 
scales. It seems safe to conclude from thes 
results that acquiescence response style ha: 
no practical effect on the standard MMP! 
scales. 

While proponents of stylistic interpreta 
tions of the MMPI might find these result: 
somewhat disturbing, they would be quick t 
point out that they are inconclusive for a 
least four reasons, all based on the premis 
that items are not uniformly susceptible t 
acquiescence. First of all, obvious items ar 
thought to be little affected by acquiescen 
tendencies (e.g., McGee, 1962a). Thus, thei 
inclusion would tend to diminish the signifi 
cance of the differences. Second, items of ex 
treme desirability or undesirability are pre 
sumably little influenced by acquiescenc 
(e.g., Jackson & Messick, 1962; Messick ¢ 
Jackson, 1961; Wiggins, 1962), and their in 
clusion would also tend to diminish the sig 
nificance of the differences. Third, acquies 
cence is purported to be found only on item 
of high controversiality (median difficulty 
(e.g., Wiggins, 1962). Since mean desirabilit, 
and probability of endorsement are hight 
correlated, the second and third reasons ar 
nearly equivalent. Fourth, many, though no 
all, of the scales include items keyed in bot 
directions; thus, stylistic effects on true-keye 


TABLE 2 


Trst-RETEST CORRELATIONS FOR SUBTLE AND OBVIOU 
MMPI SCALES FOR RELIABILITY 
AND REVERSAL GROUPS 














Male Female 
Relia- Re- Relia- Re- 
bility versal bility versal 
group group  Differ- group group  Diffe 
Scales N=95 N=96 ence N=108 N=125 _ enc 
D-S 86 61 25e% 77 64 13} 
D-O 84 77 07 83 77 06 
Hy-S 79 66 13 74 76 —02 
Hy-O 84 76 08 81 73 08 
Pd-S 74 48 26%** 65 61 04 
Pd-O .- 80 83 —03 66 68 —02 
Pa-S 64 66 —02 71 64 07 
Pa-O 62 63 *=—(O1 69 60 09 
Ma-S 66 63 03 72 61 11 
Ma-O 77 58 19* 75 64 Ad 





Note.—Decimal points omitted. 
* p> < .05. 
ek D = .O1. 
® Scale designation as given in Dahlstrom and Welsh (1966 
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nd false-keyed parts of the scale might can- 
el each other out. These objections will be 
onsidered in turn. 

The results for the 10 subtle and obvious 
cales (Wiener, 1948) are presented in Table 
. In exactly half the cases the difference 
etween the two groups was greater for the 
ubtle than for the obvious portion of the 
cale. Obviously, subtlety alone is an insuf- 
cient requirement for the elicitation of sty- 
stic response tendencies. However, it should 
e noted that in the case of D-S significant 
ifferences were found for both males and 
smales, 

The widely held belief, based on factor 
nalytic studies, that acquiescence response 
“yle is moderated by item desirability (Jack- 
on & Messick, 1962; Messick & Jackson, 
961; Wiggins, 1962) may be partially as- 
»ssed by examining Table 3 and the last five 
sales in Table 4. The two significant differ- 
aces in Table 3 involved a scale to measure 
ne tendency to fake good (Sd) and a scale 
) measure the tendency to fake bad (+). 
‘he last five true scales in Table 4 were con- 
ructed so as to vary item desirability sys- 
matically (Jackson & Messick, 1961). Dy-1 
mtains items of extreme desirability, Dy-5 
ems of extreme undesirability, and Dy-3 
sutral items. The latter is presumably a 
easure of acquiescence. As can be seen, 
,ere were three significant differences out of 
ve for males: one for extreme, one for mod- 
vate, and one for neutral items. For females, 
e one significant difference was associated 
ith extreme items. If acquiescent response 
yle is differentially measured by items of 
wrying desirability, it is not apparent from 
sese results. 

Data are presented in Table 4 for true- 
syed and false-keyed scales. A subscript in- 
cates that a scale has been divided; for ex- 
aple, F; is composed of those items from 
‘e F Scale which are normally keyed true; 

‘is composed of those items from the F 
ale which are normally keyed false, etc. 
hen no subscript is employed, all the items 

that scale are normally keyed in the direc- 

m indicated in the table. Significant differ- 

ces were found in both groups for only one 

rt of one standard MMPI scale: Ma;. A 
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TABLE 3 


TEst-RETEST CORRELATIONS FOR SOCIAL DESIRABILITY 
AND ROLE PLlayinc MMPI Scates FoR 
RELIABILITY AND REVERSAL Groups 














Male Female 

Relia- Re- Relia- Re- 

bility versal Differ- bility versal Differ- 
Seales group group ence group group ence 
So 84 76 08 89 87 02 
So-r 87 78 09 87 86 01 
Tt 69 56 13 60 71 —11 
Sxb 60 47 13 57 62 —05 
ESDb 66 65 O01 68 56 12 
Mp 73 76 —(3 74 78 —04 
Ds-r 72 68 04 82 fe 07 
+ 90 74 16** 82 84 —02 
Sdb 78 58 20** 65 58 07 


Note.—Decimal points omitted. 
*'b < .05. 


kD = 101, 
® Scale designation taken from Dahlstrom and Welsh (1960) 
unless otherwise indicated. 


> Scale designation taken from Wiggins (1962). 


significant difference was also found for both 
males and females for Es,. 

With the exception of G and ESD, the 
remaining scales in Table 4 are those which 
have been specifically constructed as acqui- 
escence scales. In his influential review, Wig- 
gins (1962) lists B, Bn, Rb, Acq, and At as 
acquiescence scales. These five scales are all 
composed of items of high controversiality, 
that is, 40% to 60% endorsement percent- 
ages, and have been further screened on the 
basis of either desirability ratings (Bn, Rb, 
At) or nondiscriminability between adjusted 
and maladjusted Ss (Acq). Five other scales, 
each of which has been designated a measure 
of acquiescence by other writers, have been 
added to Wiggins’ list: A (Welsh’s first factor 
scale), Dev; and Dev; (those items for which 
a true or false response, respectively, is in 
the deviant direction), AT (the total number 
of items out of 566 that are answered true by 
a subject taking the MMPI), and Dy-3, 
which has already been discussed. Wiggins 
(1962) argues that the Deviant True Scale 
is a joint function of acquiescence and non- 
communality, and says of AT that it “is not 
proposed as a meaningful acquiescence meas- 
ure [p. 235].” R, Welsh’s second factor scale, 
which has frequently been proposed as a 
measure of naysaying, orthogonal to that of 
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yeasaying, is also included at the end of Table 
4, 

For these acquiescence scales there seems 
to be an inverse relationship between the 
significance of the differences found and the 
care with which the scale was constructed. 
Only for the Deviant True, Deviant False, 
and All True scales were significant results 
found for both males and females. Particu- 
larly striking were the findings for Bu and 
Ré, both of which include only items of high 
controversiality and median desirability. The 
practical significance of those differences 
which were statistically significant will be 
further considered in the next section. 


A COMPONENTS OF VARIANCE MODEL FOR 
MEASURING ACQUIESCENCE 


The following model is similar to one pro- 
posed by Chapman and Bock (1958), but is 
‘more general in that it eliminates some of 
their assumptions. For a test with a true- 
false format, stylistic responding can range 
from yeasaying (acquiescence) to naysaying 
(criticalness, cautiousness). It may be repre- 
sented by a variable having positive values 
for acquiescence, negative values for critical- 
mess, and zero value for no response style 
whatsoever. The problem is simplified if only 
scales composed of items keyed in but one 
direction are considered (Chapman & Bock, 
1958; Jackson & Messick, 1961). Obviously, 
for scales whose items are keyed “true,” ac- 
quiescence, as here defined, can only add to 
the score; for scales whose items are keyed 
“false,” acquiescence can only subtract from 
the score. This is an analytical result. No as- 
sumptions concerning the relationship of ac- 
quiescence to scale content are involved. That 
velationship will be estimated by the model, 
which assumes only the existence of adequate 
‘tem reversals. Let X;; = the score of the I® 
-espondent on the J replication of the true- 
sxeyed form of the scale; Yi; = the score of 
“he It respondent on the J™ replication of 
‘he false-keyed form of the scale; a; = the 
component of score for acquiescence; y; = 
he component of score for content; and $;; 
ind «;= components of error, with zero 

nean, random over replications and individ- 
ials, independent of each other and the other 
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components, and similarly distributed for all 
individuals. (This replication error cannot be 
directly measured for any present personality, 
attitude, or interest inventories. It can be 
estimated for any given form by the test-retest 
reliability, which will be somewhat high be- 
cause it does not include a component of 
variance for test form, or by KR-20, which 
will be low because it includes a component 
for item heterogeneity. ) 
From the above definitions, 


Nt Va eel Oy, bile 


[2] 


Note that a can be either positive or negative, 
so that an individual with a yeasaying tend- 
ency would increase his score on the true- 
keyed form, whereas one with a naysaying 
tendency would increase his score on the 
false-keyed form. The score variances are 


Oe = Oy + Oa + Loy + o3,/ [3] 


and 
Merve eee ee) 


and 


Cp ee Ana 20a oe. 


Subtracting [3] from [4], 
Cy = Oe aah 


From the identities, o;? = (1 — prr)oz” and 
oe = (1 — pyy)o,7, it may be shown that an 
estimate of oy, the covariance between con- 
tent and acquiescence, is given by 


Gye = (eS 2 — TyySy°). [5] 


The covariance of the true- and false-keyed 
forms is given by 


[4] 


Ove = a(x" a 


[6] 


If [6] is subtracted from either [3] or [4], 
and appropriate substitutions are made, an 
estimate of o,’, the variance attributable to 
acquiescence, can be obtained: 


Ga? = Ere + ryySy? — 2raySeSy). [7] 


Finally, if either [3] or [4] is added to [6], 
and appropriate substitutions are made, an 
estimate of oy’, the variance attributable to 
content, is obtained: 


Gy? = F(feoSa ae TyySy? “tn 27 xyS 2S y). 


2 2 
Oxy One =I aes 


[8] 
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TABLE 5 


PERCENTAGE OF VARIANCE ATTRIBUTABLE TO ACQUIESCENCE AND CONTENT FOR SELECTED MMPI Scares 





True scales 


Male Female 





False scales 


Male Female 


Scales | Acq.% Con. % Acq. % Con. % | Scales | Acq.% Con. % Acq. % Con. % 


F, 13 66 19 102 
Hs, 02 54 00 61 
D; 04. 79 04 73 
Hy; 01 70 04 69 
Pd, 02 79 O1 69 
Pa, 04. 66 05 72 
Pt, 05 78 O1 79 
Sct 02 83 03 81 
Ma, 09 61 05 70 
Str 04. 84. O1 83 
Ese 12 60 03 57 
Gr 06 84. 01 80 
ESD? 08 78 10 66 
Dev? 06 86 04. 84. 
AT» 08 77 06 76 
Ae 04. 85 01 85 
B 07 71 06 61 
Bn 02 78 03 64 
Rb» 01 73 05 69 
Acq 08 72 02 71 
At» 07 56 03 55 
Dy-1° 08 79 04 78 
Dy-2° 02 70 03 1 
Dy-3° 10 75 02 69 
Dy-4° 05 83 03 87 
Dy-5° 04. 86 05 79 








Fy 08 89 05 61 
Hsy 05 67 01 66 
D; 12 75 04 72 
Hyy 08 54 00 88 
Pdy 11 70 00 62 
Pay 00 77 01 72 
Ply 09 66 01 65 
Scr 03 63 00 54 
May 04 79 00 76 
Sis 05 87 03 81 
Ess 08 67 07 79 
Gy O1 64 00 70 
ESD,» 06 72 08 86 
Deny? 03 47 00 40 
ZL 07 63 04 60 
Ka 06 76 00 81 
R 05 69 06 80 





® Scale designation from Dahlstrom and Welsh (1960) unless otherwise indicated, 


b Scale designation from Wiggins (1962). 
© Scale designation as indicated in text. 
4 Contains one true-keyed item, 

® Contains one false-keyed item. 


These two estimates of variance *® were 
computed for all of the scales in Table 4. The 
test-retest reliabilities obtained for the orig- 
inal MMPI scales were also used as estimates 
of the reversed form reliabilities. While some 
question may be raised concerning the ap- 
propriateness of these analyses for those 
cases where the reliability and reversal cor- 
relations were insignificantly different, the 
exercise should prove a salutary antidote to 
those reckless estimates that have been de- 
rived on the basis of other procedures. 

The results, expressed as a percentage of 
the variance of the original MMPI scale, are 


8When S,2= 5,2 and fao=f,y,, these estimates 
are the same as those of Chapman and Bock (1958). 
When these equalities do not hold, the present esti- 
mates should be more accurate. 


presented in Table 5. In general, the scale 
variances for the two control and two experi- 
mental administrations were very similar.* 
In order to eliminate between-group differ- 
ences, the variances for the experimental 
group were used in both the numerator and 
denominator in computing the percentages 
in Table 5. In one case, that of F, for females, 
this resulted in the use of an atypical esti- 
mate of the true scale variance, and the ac- 


4A list of the standard deviations of all scales in 
Table 5 may be obtained from the authors without 
charge, or from the American Documentation Insti- 
tute. Order Document No. 8502 from ADI Auxiliary 
Publications Project, Photoduplication Service, Li- 
brary of Congress, Washington, D. C. 20540. Remit 
in advance $2.00 for microfilm or $3.75 for photo- 
copies and make checks payable to: Chief, Photo- 
duplication Service, Library of Congress. 
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quiescence and content percentages for this 
group for this scale are inflated. (For control 
administrations one and two and experimental 
administrations one and two, the standard 
deviations of the F; Scale were, respectively, 
2.25, 2.27, 1.67, 2.34. It is the use of the 1.67 
estimate that results in the inflated percent- 
ages for this scale for females in Table 5.) 

The results in Table 5 are consistent and 
straightforward. Even for those scales that 
are presumably pure measures of acquies- 
cence it can be seen that no more than 7% 
of the total scale variance is so attributable. 
Bearing in mind that there are a number of 
factors inherent in the experimental design 
which combine to inflate the acquiescence es- 
timates, it seems doubtful that acquiescence 
accounts for even a trivial proportion of the 
response variance on the MMPI. 


DISCUSSION 


On the basis of an extensive review, Rorer 
(1965) has previously concluded that the 
literature contains no evidence unequivocally 
showing acquiescence response style to be of 
importance in any personality, interest, or 
attitude inventory. However, in spite of this 
lack of evidence, the importance of acquies- 
‘cence is so widely accepted today that it has 
become necessary to demonstrate its non- 
existence (rather than its existence, as would 
more appropriately be the case). Rorer and 
Goldberg (in press) utilized an item by item 
response tabulation to show that the propor- 
‘tion of content inconsistent responses was not 
significantly greater for the experimental than 
for the control groups described here. How- 
ever, these findings left open the possibility 
that the small discrepancies that they found 
might be accounted for by a concentration of 
items on some one scale, in particular, some 
acquiescence scale, The results presented here 
effectively discount that possibility. 

In estimating the generality of the present 
results, it should be noted that almost all 
studies of acquiescence have utilized college 
student samples, as did the present study. 
Investigators who have addressed themselves 
to the question of the generality of acquies- 
rence (e.g., Jackson & Messick, 1962) have 
urgued strongly for the similarity of results 
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among psychiatric, prison, and normal sam- 
ples. While it would not be appropriate to 
contend that the present results demonstrate 
the unimportance of acquiescence response 
style in other groups, it does seem appropri- 
ate to argue that the results do shift the bur- 
den of proof to those who wish to argue 
for the importance of acquiescence in other 
groups or other situations. 

The finding that none of the MMPI acqui- 
escence scales measures acquiescence has a 
number of ramifications. First, use of these 
scales for this purpose should be discontin- 
ued, Second, all studies in which it has been 
concluded on the basis of correlations with 
these scales that acquiescence is important in 
some other scale or some other instrument 
should be discounted. Third, inferences de- 
rived from factor analytic results which have 
used these scales as marker variables for 
identifying acquiescence factors should be 
similarly discounted. Fourth, the contention 
that present psychological inventories may be 
improved by controlling for or correcting for 
stylistic variables must be questioned. 
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PATTERNED SCALED EXPECTATION INTERVIEW: 


RELIABILITY STUDIES ON A NEW TECHNIQUE 


JAMES B. MAAS 1 
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A proposed interview procedure has ratings based on scaled examples of on-the- 
job behavior. Traits necessary were determined, and examples were written 
of behaviors related to these traits. Examples were checked for agreement as 
to trait category, and scaled as to degree of the trait exhibited. Interviewers 
rated each candidate by making analogies from the candidate’s responses to 
job behavior that might be expected of the candidate. Interviews using 3 
raters to judge 1 candidate simultaneously and using 2 different interviewers 
to judge the same candidate 1 at a time, indicate the technique’s high reli- 
ability. Interrater reliability was significantly higher (p<.01) using the 
scaled expectation rating method than when using a traditional adjective 


rating scale. (Sources of variance in interviews are specified.) 


The interview has been considered an in- 
dispensable instrument ever since man began 
face-to-face conversation directed at apprais- 
ing and selecting people. Though the inter- 
view, more than any other selection device or 
procedure, affects the businessman’s decision 
to hire or reject, frequency of use is not a 
suitable criterion for assessing its reliability 
or predictive validity. 

The literature is replete with studies on 
‘nterviewing for normal personality and abil- 
‘ty traits, but with the exception of restrictive 
questionnairetype interviews, few methods 
oroduce high reliability or validity. Symonds 
(1931) investigated carefully the value of the 
mterview and reached the conclusion that 
‘statistical evidence of the reliability of inter- 
views is almost non-existent [p. 477].” In 
siting the causes of low reliability, Bingham 
“1959) mentioned rater bias, lack of rater 
raining, poorly worded questions, lack of 
‘learly stated criteria, and inadequately de- 
ined rating scales. 

The series of studies to be reported herein 
wre concerned with the patterned interview. 
Jsing McMurry’s (1947) definition, the pat- 
‘erned interview is one with well-defined cri- 
eria for selection, an interview guide based 
‘m questions that might predict how well the 
andidate rates on these criteria, well-trained 
aterviewers, and a suitable rating scale for as- 
essing information obtained from candidates. 


1 The writer is indebted to Patricia C. Smith for 
er help in all stages of this research. 
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Specifically, the experimenter attempted to 
raise the reliability of the patterned interview 
by developing a meaningful, consistent rating 
scale. Ratings of candidates by different raters 
are almost always treated as equivalent; yet, 
with traditional adjective or numerical rating 
scales, there is often little agreement between 
raters as to the exact definitions of the traits 
being rated, and as to the meanings of the 
trait levels. What the term “good” signifies 
for Rater A might well be equivalent to “very 
good” in the mind of Rater B. A “5” rating 
by Rater A might be equivalent to a “4” or 
“3” rating by Rater B. 

The studies compared the reliability of the 
traditional adjective rating scale with a scaled 
expectation rating method first developed by 
Smith and Kendall (1963) for use in rating 
on-the-job behaviors of nurses. The Smith- 
Kendall procedure seemed to combat rating 
errors associated with lack of clearly defined 
traits and levels by establishing a series of 
continuous graphic rating scales which were 
anchored both by general definitions and by 
descriptions of actual on-the-job behaviors. 
A continuous scale was used as no prior evi- 
dence was available concerning the fineness 
of discriminations possible. The Smith-Kendall 
procedure is not unlike Flanagan’s critical 
incident technique; however, use of critical 
incidents was avoided since there are too 
many variations in the interview conversation 
and mentioned specific instances might not 
occur. 

The procedure for establishing the expecta- 
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tion scales was as follows: First, the traits to 
be evaluated were established by a com- 
mittee of interviewers who were familiar with 
the job to be performed. Secondly, examples 
of on-the-job behaviors were written to illus- 
trate three levels of each trait—a high degree 
of the trait, an average degree of the trait, 
and a low degree of the trait. Thirdly, in- 
dependent judges, not knowing which ex- 
amples were written for which traits and 
levels, reallocated the examples back into 
traits and levels. 

Fourthly, only examples with complete 
agreement as to trait and level were retained. 
Lastly, these examples were arranged on a 
continuous vertical graphic rating scale, 
similar to the Fels scales, putting each ex- 
ample at its proper scaled level for the trait. 

On a typical trait page in the interview 
guide, questions appeared on the left-hand 
side, and the rating scale on the right-hand 
side of the page. Interviewers rated each 
candidate on each trait by making analogies 
from the candidate’s responses, to behavior 
that might be expected of the candidate, 
were he actually on the job. After making an 
assessment, the interviewer checked the ex- 
pected behavior level on the vertical trait 
scale. The interview guide contained a page 
for each trait, a piece of carbon paper, and a 
score sheet at the back of the trait pages. 
Weighted numerical values for each trait 
level for each trait were listed on the score 
sheet only, so as not to confuse or bias the 
rater. The check marks from the scales were 
transferred to the score sheet through the 
carbon paper. The score sheet contained the 
individual trait scores, the sum of these im- 
portance-weighted scores, an 11-point overall 
subjective rating scale, and a grand total 
score, which consisted of the trait scores 
and the overall rating taken together. 

The subjects for the first study were 360 
Cornell University undergraduates, who were 
being interviewed for highly selected posi- 
tions as Orientation Counselors. Twelve 
undergraduates, serving on the Orientation 
Selection Committee, were trained in inter- 
viewing techniques. Traits for success as a 
counselor were defined, and a patterned inter- 
view guide was written. An adjective rating 
scale was used in this program, as the best 
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available at the time. The rating scale for 
each trait consisted of five categories: Very 
Good, Good, Average, Poor, and Very Poor. 
Each of the candidates was to have two 15- 
minute interviews, and the purpose of the 
study, in addition to selecting counselors, was 
to ascertain the reliability of the two inter- 
viewers’ ratings of the same candidates. Can- 
didates were rated during the course of the 
interview on six normal personality traits (i.e., 
leadership, enthusiasm for the program, re- 
sponsibility, empathy, friendliness, and ma- 
turity); the two interviews varied only in the 
fact that slightly different questions were 
asked by the two different interviewers. 

For each of two interviews, each candidate 
received trait ratings, an overall subjective 
rating, and a grand total of the trait rating 
plus the overall rating. Correlation coefficients 
were used as a measure of inter-interview 
reliability for these ratings. Inter-interview 
reliability, as defined here, consists of both 
inter-interviewer reliability and candidate 
reliability—the questions in the two inter- 
views differed slightly and the candidate 
could not be expected to react exactly the 
same with both interviewers. The results of 
the first study show rather low correlations; 
the median inter-correlations of the trait 
scores was .35, of the overall ratings .34, and 
of the grand total scores, .34. All correlations 
were significant at the .01 level. 

The following year 500 candidates were 
interviewed for the Orientation program—only 
this time the scaled expectation rating tech- 
nique was substituted for the adjective scales. 
The median intercorrelations, .58, .47, .55, 
show a significant improvement over the inter- 
interviewer reliability correlations of the 
previous year, the two sets of results being 
significantly different from one another at the 
.O1 level. 

The scaled expectation technique was again 
used on 188 candidates for the position of 
Women’s Dorm Counselor; the results were 
similar to those of the second orientation 
program, and again reliability coefficients 
were significantly higher than those achieved 
with the traditional adjective scales. 

Using the scaled expectation techniques for 
a Male Dorm Counselor selection program 
permitted a check on inter-interviewer reli- 
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ability, interviewguide and candidate reli- 
ability being held constant. In this program, 
each of 172 candidates was given an inter- 
view by three interviewers simultaneously. 
The interviewers took turns asking questions, 
but each interviewer rated each candidate 
independently on all traits. Inter-interviewer 
reliability was .69 for the trait scores, .65 for 
the overall ratings, and .72 for the grand total 
scores. These reliability measures are sig- 
nificantly higher (f< .01) than found in 
either of the previous studies. Interview-guide 
and candidate inconsistency had contributed 
significantly to the variance of inter-inter- 
view reliability. 

To summarize, a total of 2,268 interviews 
were conducted to study interview reliability 
using two different rating scales. A patterned 
interview program using adjective rating scales 
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was compared with patterned interviews using 
the scaled expectation technique, and the 
latter proved significantly more reliable. When 
interview-guide and candidate inconsistencies 
were controlled, the scaled expectation tech- 
nique showed further improvement in produc- 
ing agreement between raters. 
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This paper reports the follow-up phase of a study of peer nominations begun 
in 1955 at the Naval OCS in Newport, Rhode Island. Over 700 trainees com- 
pleted several peer nomination forms at various stages of training, 1 in par- 
ticular on “success as a future Naval Officer” (FO). Subsequently, 639 trainees 
were identified who had gone on to duty as officers for about 3 yrs. The 
average grade they secured on a key portion of the fitness report ratings 
assigned by their direct superior officers was used as a performance criterion; 
it had a split-half reliability of .90. In the prediction of this criterion, the 
FO peer nomination score from the 3rd week of training gave a validity of 
40 which was as high as that for later FO scores and which was only slightly 
diminished after academic grades and popularity were partialed. The findings 
support the use of early peer nominations as a valid supplemental measure in 


predicting performance after training. 


Among the more consistent findings con- 
cerning peer nominations, and related modes 
of peer rating, is the significant validity these 
afford in predicting later performance (cf. 
Hollander, 1954). Since one persisting prob- 
lem in the assessment of trainees is the early 
identification of those who are not likely to 
meet later performance demands, this con- 
tribution can be of substantial value. Many 
studies conducted in military settings indicate 
that peer evaluations during officer training 
successfully predict later criteria of perform- 
ance (e.g., McClure, Tupes & Dailey, 1951; 
Ricciuti, 1954; the West Point Follow-up 
Studies of 1948, 1949, and 1952; Williams & 
Leavitt, 1947; and Wollack & Guttman, 
1961). There is also evidence of their utility 
from other spheres of activity; as one illus- 
tration, Weitz (1958) has found a relationship 
of .40 between peer nominations and later 
ratings of life insurance agents in a super- 
visory position. There appears to be a rea- 
sonable basis therefore to contend that peer 


1 This research was supported initially by ONR 
Contract 760(06); the phase reported here was com- 
pleted as part of ONR Contract 816(12). The author 
is indebted to the officers and trainees of the OCS 
for providing a cooperative setting for the original 
study, and to Sidney Friedman, Victor Fields, and 
Joseph Cowan of BuPers, as well as Luigi Petrullo 
and Abraham Levine of ONR, for considerably fa- 
cilitating the follow-up phase of this work. Leonore 
Ganschow and Karol Anderson were of inestimable 
aid in helping with the analyses. 
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nominations do provide distinctive prediction 
of performance at a considerably later time. 

One of the problems with the peer nomi- 
nation technique, however, has been its very 
simplicity. When employed with discrimina- 
tion, it can provide a unique contribution to 
evaluation. But abuses of procedure are com- 
mon and guidelines for improvement are not 
readily available apart from the essential point 
that nominations should be made within the 
framework of an explicitly stated standard 
(cf. Hollander, 1964, Ch. 8). 

To answer several questions about the opti- 
mum utilization of peer nomination proce- 
dures, the author undertook a basic study 
in 1955 at the Naval Officer Candidate 
School at Newport, Rhode Island (Hollander, 
1956a). An entire OCS class, composed of 
23 trainee sections, was made available for 
this investigation. Its aim was to assess the 
effects on the reliability and validity of peer 
nomination scores arising from four factors: 
the nature of the quality to be rated; the 
length of time the members of the section 
had been together; the use of a “research” 
set versus a “real” set on the nomination 
forms; and the effects of friendship choice. 

Four forms, each setting out different quali- 
ties to be rated, were utilized; these dealt 
with leadership, motivation for naval service, 
probability of success in OCS training, and 
success as a future officer. All were adminis- 
tered at least three times during the training 
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period of 16-weeks’ duration, usually at the 
time of initial orientation, and again at the 
third and sixth weeks of training. In addition, 
at each stage trainees indicated five section 
members whom they considered to be friends. 

Calculating the reliability of the forms at 
each time of administration by the split-half 
method, it was found that even the earliest 
nomination scores, derived from forms admin- 
istered after several days of contact during 
orientation week (“O”), yielded reliability 
coefficients above .90. The increase in re- 
liability was not appreciable for the later 
scores. Furthermore, a high repeat reliability 
was found for the same forms administered at 
different times (cf. Hollander, 1957). 

Validity was initially determined from cri- 
teria accessible in the training program, that 
is, pass-fail, final academic average, and 
final military aptitude grade assigned by su- 
periors. For these criteria, the peer nomination 
scores gave significant, and differentially dis- 
criminating, validity coefficients. Thus, nomi- 
nations for probability of success in the 
school, secured before the onset of formal 
classes, were significantly correlated with the 
ultimate pass-fail and academic criteria. Fur- 
thermore, scores obtained from nominations 
made very early in training, and certainly by 
the third week, provided completely compa- 
rable validity to that of later nominations 
(Hollander, 1956c). 

Consistent with the findings previously re- 
ported by others (e.g., Wherry & Fryer, 
1949), the popularity dimension represented 
in broad choice as a friend was not found 
to have a major intrusive effect on the valid- 
ity of these nominations (Hollander, 1956b). 
A “friendship score” based upon the number 
of friendship choices received by a section 
member was not found to be significantly 
correlated with the criterion of academic per- 
formance, though it did account for varying 
parts of the variance in the other nomina- 
tions. However, fewer than half the section 
members named as friends were nominated 
as “high” on FO, at any time, thus cor- 
roborating the separable discrimination re- 
ported earlier with a different population by 
Hollander and Webb (1955). While “. . . it 
would not do to suggest that this reveals 
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no bias in favor of friends . . . it remains 
to be determined whether those friends who 
are nominated ‘high’ might not be deserving 
of this status” (Hollander, 1956b, p. 440). 
Most pertinently in this regard, when valid- 
ity coefficients for peer nomination scores 
were corrected by partialing friendship scores, 
their value was retained and friendship there- 
fore could not be said to impair validity in 
the aggregate. 

Finally, though different forms gave signifi- 
cant differences in validity against various 
criteria, there was no general disparity be- 
tween the validity of forms administered 
under a “research” as against a “real” set. 

While these findings did provide several 
procedural implications, it was nevertheless 
considered essential that additional data be 
obtained for the construction of a posttraining 
criterion against which the validity of these 
peer nomination scores might be further 
studied. This paper presents the findings of 
follow-up research directed toward that end. 


PROCEDURE 


In the OCS phase of this study, all of the more 
than 700 trainees were given a primary form calling 
for nominations on “success as a future Naval Offi- 
cer” (FO). This was seen to be of particular per- 
tinence for the prediction of more distant, officer 
performance criteria. In addition to this primary 
form, each section received one of three other forms, 
that is, “leadership qualities’ (LQ), “interest in 
and enthusiasm for the Naval Service” (IE), and 
“probability of success in OCS” (OC). All of these 
forms were administered three times during training 
(the Orientation Week, Third Week, and Sixth 
Week) and, in the case of the FO form, a fourth 
administration was conducted at the Twelfth Week 
as well. 

Cutting across this design, approximately half the 
sections received a “research” set (RO) with the 
explicit point, appearing on the forms, that the 
results of the ratings were to be used for research 
purposes only. The other sections were given a 
“veal” set with equally explicit instructions that 
the results might be used administratively (AU). 
This split in treatment, designed to provide data 
on differential reliability and validity, gave only 
minor differences mainly in terms of form-set inter- 
actions. 

The rating form required five “high” and five 
“low” nominations in order of preference. Each of 
the subjects (Ss) was provided with a complete 
alphabetical roster of his section mates every time 
he was required to complete a form. The author 
was the sole administrator of the forms for all sec- 
tions at all times. 
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A direct weighting procedure was applied to de- 
rive peer nomination scores. The highest nominee 
was awarded a +5, the next highest, a +4, and 
so on through the 5 “highs”; similarly, the lowest 
nominee was assigned a —5, the next lowest, a 
—4, and so on. An algebraic sum was then obtained 
for each S and divided by the N of the group 
minus 1, since no S could nominate himself. This 
resulted in an average score ranging on a continuum 
from +5 to —5. To remove the minus sign, a con- 
stant of 5 was added to this score and the resultant 
value was then multiplied by 10 in order to permit 
the use of a 2-digit score without the intervening 
decimal point. The distribution arising from this 
procedure has normal characteristics with a mean 
of about 50 and a standard deviation approximating 
10 for the total population of the study. Though 
this score may be seen to have certain features of 
the standard score, it neither obscures section dif- 
ferences, as does the standard score, nor does it 
presume homogeneous characteristics from section 
to section. 

Other measures available from the training phase 
were scores on the five scales of the Officer Classi- 
fication Battery (OCB), an ability test; a final 
academic average for courses in OCS; and a final 
military grade assigned by the officers in charge 
of cadet sections. 

The criterion of officer performance applied in 
the follow-up phase of this study was derived from 
Question 16a of the Standard Report on the Fitness 
of Naval Officers (NAVPERS-310, revised 3-54). 
This question is given below with the weights used 
in this study to calculate an average score: 


In comparison with other officers of his grade and 
approximate length of service, how would you 
designate this officer? 
5 One of the few highly outstanding officers 
I know 
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4 A very fine officer of great value to the service 
3 A dependable and typically effective officer 
2 An acceptable officer 

1 Unsatisfactory (adverse) 


The 639 officers in the follow-up study had been 
in active service for an average of no less than 
3 years during which time they were usually evalu- 
ated twice yearly. The distribution of average scores 
received on this item of the “fitness report” was 
unimodal with a mean of 3.6 and a standard devia- 
tion of .5. A split-half reliability analysis of this 
rating yielded an uncorrected score of .81 which 
reached .90 when corrected by the Spearman-Brown 
formula. This rating also has been found to have 
a high intercorrelation with other scales on this 
fitness report (see King & Wollack, 1960, p. C4), 
especially with the assessment of various qualities, 
notably leadership, in Question 19. 


RESULTS 


Table 1 summarizes the validity coefficients 
obtained for the four peer nominations admin- 
istered at three time periods. As will be seen 
there, “future officer” peer nomination is the 
best predictor among them of the criterion 
of officer performance obtained from fitness 
reports over 3 years or more. However, the 
other peer nomination forms quite generally 
give substantial and significant prediction, 
beginning with the third week, for both the 
intraining and the posttraining criteria. In 
view of the high average correlation (.90) 
between the FO and LQ forms, it is no sur- 
prise that LQ should so closely approximate 
the validity of FO. 


TABLE 1 


AVERAGE VALIDITY COEFFICIENTS AGAINST THREE PERFORMANCE CRITERIA 
FOR Four PEER NOMINATION SCORES FROM THREE STAGES OF TRAINING 





Criteria 
Final OCS Final OCS Officer 
Peer nominations military grade academic grade fitness report 

Week completed 0 3 6 0 3 6 0 3 6 
“Future Officer” (FO) 37 45 46 14 42 45 24 .40 39 

N = 639 
“Interest and Enthusiasm” (IE) 40 45 40 17 32 23 19 38 33 

N = 228 
“Success in OCS” (OC) noo 38 o2 ol ie Rud: 26 36 oo 

N = 182 
“Teadership Qualities’ (LQ) Al 47 1 19 33 39 26 36 A 


N = 229 
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TABLE 2 
MAtTRIx OF INTERCORRELATIONS OF OCS PREDICTORS AND OFFICER PERFORMANCE CRITERION 

2 3 4 5 6 7 8 9 10 11 12 

1 FO Peer Nomination “0”? Week iS 165 .56 03 07 —.06 mut 08 37 14 24 
2 FO Peer Nomination ‘3’? Week .92 81 a2), 21 18 RS ah 7 45 42 40 
3 FO Peer Nomination ‘6’ Week .89 Pl aol a) 29 Bi 46 45 39 
4 FO Peer Nomination ‘12’? Week 16 ee, aL 28 .16 45 43 40 
5 OCB Verbal Reasoning 29 34 28 #23 .08 Al aS; 


6 OCB Mechanical Comprehension 
7 OCB Mathematics 

8 OCB Relative Movement 

9 OCB Spatial Relations 

10 Final OCS Military Grade 

11 Final OCS Academic Grade 

12 Officer Fitness Report Criterion 
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Note.—N = 639. 


Taking account of the correlation of .40 
between the third week FO peer nomination 
and the officer performance criterion, it is 
evident that even at this early level of ex- 
posure to one another, prediction of the more 
distant criterion is substantial. Table 2 pro- 
vides a matrix of intercorrelations for the 
FO peer nominations, other OCS predictors, 
and the officer performance criterion. It is 
noteworthy that the third week FO validity 
is as high as that obtained in the adminis- 
tration at the twelfth week, and furthermore 
that the only other variable reaching this level 
of validity is the final OCS academic grade, 
which correlates .41 with the officer perform- 
ance criterion. In view of the fact that the 
OCS academic grade and the third week FO 
score correlate .42, a partial r was computed 
holding academic performance constant; the 
resulting validity of the third week FO score 
in predicting the officer performance criterion, 
with academics partialed, was .28. 

In another line of analysis, the third week 
‘FO score, the final academic grade, and the 
OCB Mathematics score were combined into 
a multiple correlation with the officer per- 
formance criterion. This procedure gave an 
R of .51, with the beta weights for FO and 
final academic grade being high and of essen- 
tially the same magnitude, .29. 

Still another way of viewing the predictive 
effectiveness of the third week FO peer nomi- 
nation is to compare trainees who are in the 
upper segment of the distribution on FO with 





those in the middle and in the lower part of 
the distribution. In pursuring this, six sections 
of the entire sample, a total NV of 174, were 
divided into those who had reached a score 
of 56 or over (.6 sigma above the mean) on 
the third week FO, those who had scored 
between 55 and 45 (.5 sigma above to .5 
sigma below) on that measure, and a bottom 
segment scoring 44 or under (.6 sigma below). 
Those in the upper group (28%) had an 
average fleet performance score of 3.90, those 
in the middle (46%) a score of 3.63, and 
those in the lower group (26%) a score of 
3.36; all of these differences are significant 
at or beyond the .05 level. This reflects the 
early discriminatory power the FO form has 
in predicting thresholds of later officer per- 
formance. 

An aggregate “friendship score,” made up 
of the total number of friendship choices re- 
ceived by a trainee, had previously been 
found not to correlate with the OCS academic 
criterion. However, in this follow-up the third 
week friendship score was found to correlate 
with the posttraining performance criterion 
at a level of .22. The repeat reliability of the 
third week friendship score was .82. When 
the validity coefficient of .40 for the FO 
third week is corrected by a partial 7, taking 
account of this friendship score’s correlation 
of .58 with FO, this gives a corrected validity 
coefficient of .33. 

Unlike the study by Wollack and Guttman 
(1961) no initial distinction was made in this 
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study between officer assignments to shore 
and fleet billets. Nonetheless, the correlation 
we obtained of .40 accords well with the 
validity coefficient of .33 they secured for 
peer ratings at the eleventh week of OCS 
correlated with the same question from the 
fitness report for officers assigned to the fleet. 
To determine the comparable predictability 
of the third week FO score for fleet assign- 
ments, a subsample of 135 line officers in the 
present study was drawn at random. This 
analysis yielded an average validity coeffi- 
cient of .54 which was substantially greater 
than the validity of .40 obtained with the 
criterion for the entire sample. 

As a final point, we found that in almost 
every case, following graduation, the new 
officers had gone on to duty under instruction. 
Modally, they had received one rating each 
while in this status. An analysis to see the 
effect of this “duty under instruction” rating 
upon the overall officer performance rating 
indicated that it had no significant effect 
either upon the overall reliability of the offi- 
cer performance score or upon the validity 
of the FO peer nomination in predicting this 
criterion. 

The evidence presented uniformly supports 
the conclusion that peer nominations used 
early in training can make a distinctive con- 
tribution to the prediction of a long-range 
criterion of performance after training. A 
direct implication of this is to encourage the 
judicious use of such early evaluations as 
a supplement to other predictive measures. 
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ITEM POPULARITY AND SOCIAL DESIRABILITY 
IN THE MMPI* 


H. J. WAHLER 
Ohio State University 


While the popularity of an item is partially a function of its judged social 
desirability (SD), reliable item preferences also occur which are independent 
of a general SD variable and which in some cases may have greater predictive 
power. Four analyses showed: (a) The proportion of true responses to MMPI 
items obtained from 10 disparate groups contained between 40% to 52% 
common variance on the average across items classed as very desirable, 
desirable, neutral, undesirable, and very undesirable. (b) With SD controlled, 
intergroup partial rs were all significant (<.001) and averaged .63. When 
item preferences were held constant, correlation of group profiles with SD 
averaged —.40. (c) Squared beta weights from multiple-regression equations 
employing group profiles and SD values as predictors of other group profiles 
were significantly larger for the item preference variable than for SD in all 
but 1 instance. (d) Residuals resulting from subtraction of variance attributa- 
ble to item preferences from appropriate multiple Rs were significantly smaller 
than those obtained by removal of variance attributable to SD in all but 1 case. 


Edwards’ propositions regarding the social 
lesirability variable (SD) are by this time 
vell known. In a recent article Edwards, 
Nalsh, and Diers (1963) specified that “the 
yrobability of a True response to a person- 
lity item is a linear increasing monotonic 
unction of the social desirability scale value 
if the item [p. 255]” and documented the 
act that a relationship of this type has been 
ound in a number of studies. In general, 
orrelations in the vicinity of .87 between 
sroportions of true responses and social de- 
irability values of items have been obtained 
vith college samples. 

When scatter plots of the proportions of 
rue (T) responses to MMPI items relative 
0 item SD values are examined, it becomes 
sraphically clear that the proportions of T re- 
ponses may differ markedly among items with 
xactly the same SD values. This fact suggests 
he possibility that variations in item populari- 
ies may exist which are independent of the 
\D variable and which constitute an important 
omponent of response variance. An essential 
juestion regarding such differential response 
references for items is whether they occur 


1The author wishes to express his gratitude to 
7oungki Hahn for his tireless help with the compu- 
ations. Leonard Goodstein, Richard Clampitt, and 
aul Correll kindly provided MMPI data for which 
he author extends his most sincere thanks. 


with consistency among various groups or 
whether they merely reflect chance fluctua- 
tions. 

From a common-sense position, it is plausi- 
ble that different item contents could have 
different effects in eliciting T responses from 
the subjects (Ss) and that these effects may 
not be determined by stylistic considerations 
alone. Jackson and Messick (1961) feel that 
“The initial defining property of content as- 
sessment is some form of response consist- 
ency [p. 4].” The demonstration of prefer- 
ential consistencies independent of item SD 
values across disparate groups responding to 
a pool of heterogeneous items would certainly 
be one basic requirement in support of a 
proposition that the subjects (Ss) respond dif- 
ferentially to item content and that the rela- 
tive popularity of items is an important, gen- 
eral source of variance. The terms “item 
preference” and “item popularities” will be 
used interchangeably throughout this article 
and will be symbolized IP. 

The purpose of this study is to demonstrate 
that consistent item preferences occur in the 
responses to the MMPI of heterogeneous 
groups. While the popularity of an item is 
partially a function of its judged desirability, 
it is proposed that reliable item preferences 
also occur which are independent of a general 
SD variable and which in some cases may 
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have greater predictive power. Comparative 
analyses will be presented in support of this 
proposition. 


METHOD 


Subjects. The MMPI protocols of 10 different 
groups were used for the analyses. Groups 1 and 
6 consisted of 188 male and 322 female patients 
who were hospitalized at a Psychiatric Institute 
and Hospital.2 Groups 2 and 7 were comprised of 
50 male and 37 female students who sought coun- 
seling for personal problems at a Student Counseling 
Center in Ohio. Their mean ages were, respectively 
20.7 and 19.0 with corresponding standard devia- 
tions of 3.5 and 1.4. Groups 3 and 8 were composed 
of 50 male and 50 female sophomore students at- 
tending the State University of Iowa. Their answer 
sheets were taken in order of occurrence from those 
of a large sample of students. These Ss represent 
unselected students who were required to complete 
the MMPI on a routine basis. Their mean ages 
were 19.2 and 18.5, respectively with standard de- 
viations of 2.2 and 1.4. Groups 4 and 9 were the 
Minnesota adult male and female groups whose 
response proportions were reported in Dahlstrom 
and Welsh (1960). There were 226 males with a 
mean age of 33.1 and a standard deviation of 
10.9 and 315 females with a mean age of 33.9 and 
a standard deviation of 12.0. Groups 5 and 10 
were the Minnesota male and female college samples 
reported in the above source. There were 152 males 
and 113 females; age data were not reported for 
these groups in the reference cited by Dahlstrom 
and Welsh. Altogether, there were two groups (1 
and 6) composed of individuals who were sufficiently 
disturbed to require hospitalization; two groups 
(2 and 7) whose members were seeking help on an 
outpatient basis with personal problems; and six 
groups (3, 4, 5, 8, 9, and 10) with Ss of different 
ages and socioeconomic circumstances whose mem- 
bers were unselected on the basis of psychopathology. 

Procedure. The proportion of T responses to each 
item of the MMPI was computed for each group. 
Data given by Dahlstrom and Welsh were used as 
presented in Appendix E. For all other groups the 
proportion of T responses per item was computed 
on the basis of the number of Ss responding to the 
item, that is, if one S did not respond either T or 
F, the total N of that sample was reduced by one. 
Items which are repeated in the booklet form of the 
MMPI were included only once. To maintain uni- 
formity in the number of items responded to by 
each group, all items for which no response data 
were presented in the Dahlstrom and Welsh tables 


2The exact mean ages of these groups were not 
available. An unbiased sample was taken from the 
same large samples which included these subjects. 
With the same number of male and female subjects 
indicated above, the mean age of males was 39.1 
years with a standard deviation of 13.5 and for 
females the mean was 38.4 with a standard devia- 
tion of 12.9. 
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were dropped from the analyses. This left a total 
of 496 items. Heineman’s “favorability ratings” ob- 
tained from 108 students and reproduced in Dahl- 
strom and Welsh (1960) constituted the SD weights 
employed throughout this study. 

Analyses. Jackson and Messick (1961) observed 
that “The separation of response style from content 
can be made more clearly at the conceptual level 
than at the level of data ... [p. 5].” One inherent 
problem they stressed is the fact that both a con- 
tent related preference and a stylistic determinant 
may operate conjointly for any given response. The 
problem, in part, becomes one of demonstrating 
that the responses to personality items of hetero- 
geneous groups contain consistent, differential pref- 
erences and while such preferences may covary with 
a general stylistic factor such as Edwards’ social 
desirability variable, they also constitute an addi- 
tional source of independent variation. 

Since conventional methodological approaches to 
this problem have not been established in the litera- 
ture, four different but not entirely independent 
analyses are presented. The first and second were 
selected to provide indices of the degree of consist- 
ency in the item preferences (or item popularities) 
exhibited by disparate groups after SD effects were 
essentially eliminated by item selection or by statis- 


‘tical control. The third analysis was designed to show 


the relative contributions of item preferences and 
SD values as predictors of group profiles. The fourth 
was to show that a significant, common source of 
variance occurs in the profiles of groups in addition 
to the variance attributable to the SD variable. | 

The first method consisted of classifying all items 
into five levels of social desirability which could be 
designated highly desirable, desirable, neutral, un- 
desirable, and highly undesirable. The proportions of 
T responses obtained from each of the 10 groups 
to items in each classification were intercorrelated 3 
yielding five matrices with 45 rs each (all possible 
combinations of 10 profiles taken two at a time). 
Under these conditions, SD effects were negligible 
because of the severe curtailment of range imposed 
by establishing levels on the basis of SD values. 

The second approach entailed the use of partial 
correlation coefficients. Intergroup correlations of 
proportions of T responses across all items (group 
profiles) were computed for all combinations of the 
10 groups with SD effects statistically controlled. 
Partial correlations between groups’ profiles and SD 
values with common variance in profiles controlled 
were also computed. These two sets of partial rs 
were averaged and compared. 

The third analysis was based on comparisons of — 
beta weights derived from multiple-regression equa- 
tions. In this case, all possible combinations of group 
profiles and SD values were used as predictor varia- 
bles with profiles of other groups serving as criterion 
variables. For example, the profile of Group 1 and 
SD served as predictors of Group 2 profile; Group 2 


profile and SD were predictors for the profile of 


8 All correlation coefficients are Pearson product 
moment rs (linear hypothesis). 


SoctAL DESIRABILITY IN THE MMPI 


sroup 1 and so forth through all possible combina- 
ions which with 10 groups amount to 90. Squared 
eta weights for the SD and item preferences vari- 
bles were used as indices of the relative contribu- 
ions of these variables to prediction of group pro- 
les. (The betas for each predictor variable are 
ndependent and any common variance between 
he two is statistically controlled.) The relative mag- 
itudes of the squared beta weights for SD and 
or item preferences were compared by analysis of 
ariance, 

The fourth analysis is based on the statistical test 
f predictive efficiency, McNemar (1962): 


eee Re 0) eed 
eel RYN 


vhere: N = number of observations, k = number of 
redictors, and j=reduced number of predictors. 
‘or purposes of this study, the numerator term only 
vas used as an index of the effect of deleting vari- 
nce attributable to IP and SD as predictor varia- 
les. As in the case of analysis three, all possible 
ombinations of group profiles in conjunction with 
}D values were used as predictors of all other group 
yrofiles as criterion variables. The mean residuals 
‘ssociated with the IP and SD variables were com- 
yared by analysis of variance. 


F 


RESULTS 


The proportions of T responses given to 
tems classed as very desirable (weights 
anging from 1.16—2.07) by Groups 1, 2, 3, 
‘tc., were intercorrelated in all possible com- 
pinations taking two group profiles at a time; 
his yielded a matrix of 45 rs. The average 7 
or this level of item desirability and each 
»f the remaining four levels of desirability 
wre presented in Table 1 with the correspond- 
ng mean r? values, ranges of rs and number 
of items per matrix. The proportions of T 
responses given by each of the 10 groups were 
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correlated with SD values of each item within 
the designated range; the mean 7 and 7? values 
along with ranges are shown in Table 1. In 
computing mean values for correlation co- 
efficients, all rs were first converted to zs; 
these values were then averaged and converted 
to rs. 

In Table 1 it may be seen that at each 
level of item desirability there was consider- 
able common variance among the group pro- 
files. On the average, this communality ranged 
from about 40% to 50% of the variance over 
the five levels of SD values. In contrast, the 
proportion of variance attributable to SD 
effects was slight, ranging from about 4% 
to 16%. This was due to the fact that the 
ranges of SD values were appreciably cur- 
tailed by using the SD variable as a basis 
for classifying items into levels and was not 
cited for comparison. It does indicate, how- 
ever, that SD effects at each of the five levels 
were negligible and could not account for 
intergroup covariations of the magnitude 
obtained. 

These findings showed that when the ma- 
jority of items from the MMPI were classed 
as very desirable, desirable, neutral, undesira- 
ble, and very undesirable there remained a 
source of common variation among the pro- 
portions of T responses obtained from dis- 
parate groups which was relatively large and 
essentially independent of the SD variable. 
An F test for the significance of differences 
among the mean rs obtained for each SD 
level yielded an F of 7.04 (p< .001). The 
mean 7s of .73 and .72 were significantly 


TABLE 1 


AVERAGE INTERCORRELATIONS AMONG GROUPS AND GROUPS WITH SD For FivE LEvEts oF Item SD VALUES 








Mr 

Number of Se eS 
iD range items Gps.2 SD:Gps.> 
1.16-2.07 87 .66 —.32 
!,08-2.71 i2 ais, —.40 
1,72-3.35 61 .65 14 
1,36-4.15 184 .63 —.33 
L.16-5.00 92 72 —.40 
1.16-5.00 496 .87 —.84 





Mr Range 
Gps. SD:Gps. Gps. SD: Gps. 
A4 11 .86-.43 —.A8-—.17 
OZ 16 86-.54 —.57-—.32 
42 .04 88-33 39- .001 
39 pe, 87-.25 —.44-— .08 
50 .16 -.90-.40 —.48-— .30 
74 .69 97-.74 —.91-—.68 





s Each value in a column under ‘‘Gps” is the mean of 45 intergroup correlations; all correlations in each intergroup matrix 


vere significant at the 0.05 or less level. 


b Each value in a column under “‘SD: Gps” is the mean of 10 group by SD correlations for items falling in designated SD levels. 
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TABLE 2 
MEANS AND RANGES OF ZERO ORDER AND PARTIAL CORRELATION COEFFICIENTS 
Partial rs (Gp: Gp) Partial rs (Gp:SD) 
Zero order 7s SD constant IP constant 
Groups Number Number 
correlated of rs M Ranges M _ Ranges Groups of rs M Ranges 
N:N 15 92 .84-.97 61 .29-.82 N:N@ 30 —.41  —.07 to —.61> 
Ma:Ma 6 86  .80-.92 72 .61-.80 Ma:Ma 12 —.30 —.08 to —.53 
N:Ma 24 83 .74-.89 61 .35-.86 N:Ma 24 05 01 tomes 
Ma:N 24 —.74 —.34to —.86 
Total: 45 87.74.97 .63  .29-.86 90 —40 +.55 to —.86 
@ Profiles of underlined groups held constant. 
b N = 496;,7 or partial y = .15 significant at .001 level. 


larger than those of .66 or less but the maxi- 
mum absolute difference of .10 was not espe- 
cially large in this context and the com- 
munality attributable to item popularity did 
not differ for the various levels of SD in any 
systematic manner. 

The profiles of all 10 groups taken two 
at a time were correlated across all 496 items; 
the average of these correlations and their 
ranges are shown in Table 2 along with two 
sets of averaged partial vs and their ranges 
with both SD and IP controlled. Six of the 
groups can be designated “normal” (N)— 
Groups 3, 4, 5, 8, 9, 10 and four “malad- 
justed” (Ma)—Groups 1, 2, 6, 7. Intercorre- 
lations among profiles of N samples and Ma 
samples and both combined (Ma:N) are pre- 
sented independently in Table 2 for zero order 
and partial rs with SD controlled. 

To obtain partial rs between group profiles 
and SD values with IP controlled, full mat- 
rices for all combinations of groups. were 
computed, that is, Group 1 with SD with 
common variance of Group 2 held constant, 
Group 2 with SD with common variance of 
Group 1 held constant and so forth for all 
90 possible combinations. Where common 
variance in combinations of Ma and N groups 
was involved, partial rs with N profiles con- 
trolled were separated from those where Ma 
profiles were controlled. 

In Table 2 it may be seen that there was 
a high degree of communality among the 
profiles of all 10 groups. When variance at- 
tributable to SD effects was statistically con- 


trolled, the average communality among pro- 
files remained relatively high. The statistical 
control of common variance in profiles of 
paired groups tended to reduce the relation- 
ship between any group’s profile and SD 
appreciably. The least reduction occurred 
when Ma and N communalities were con- 
trolled and N profiles correlated with SD. 
This was primarily due to the fact that Ma: 
SD correlations only averaged .73 while N:SD 
correlations averaged .89. On the average, 
when SD was controlled, the profiles of N 
groups showed as much congruence with 
those of Ma groups as was found among N 
samples only. It is also noteworthy that com- 
mon variance among the profiles of N groups 
showed the largest drop after SD effects were 
controlled which is consistent with the fact 
that N:SD correlations were on the average 
the largest. 

In the third analysis, squared beta weights 
for the IP and SD variables were used as 
indices of their relative contributions to pre- 
dicting variance in group profiles. All possible 
combinations of group profiles and item SD 
values as predictors of different group pro- 
files were employed to compute 90 multiple- 
regression equations. From these equations 
90 beta weights for the IP variable and 90 
beta weights for SD were determined and 
squared for use as criterion measures in a 
simple randomized two factor analysis of 
variance, Lindquist (1953). 

Since Groups 1 through 5 contained male 
Ss only and Groups 6 through 10 females, 
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an analysis of variance was computed to de- 
termine the significance of possible sex dif- 
ferences. In this analysis no significant differ- 
ences were found in the criterion measures 
for groups classed by sex. Neither the main 
effect of sex classifications nor the interac- 
tion of sex groupings with type of variable 
weighted (IP or SD) was statistically signifi- 
cant; F's were .85 and .001, respectively, with 
3/172 df in each case. 

Table 3 contains the summary table of the 
two-factor analysis of variance for squared 
beta weights of IP versus SD by classifica- 
tions of groups as N or Ma. For this analysis, 
four classifications of scores were employed 
according to types of groups whose profiles 
were serving as predictor and criterion varia- 
bles: N profiles as predictors of N profiles, 
Ma as predictors of Ma, N as predictors of 
Ma and Ma as predictors of N (all predictor 
variables were in combination with SD values 
as the second predictor). 

As may be seen in Table 3, all main effects 
and the interaction were significant at less 
than the .001 level of significance. The aver- 
age squared beta weights for the item prefer- 
ences variable were significantly greater than 
those for the SD variable in all combinations 
of groups except in the one instance where 
item preferences of Ma groups (in combina- 
tions with SD) were predictors of N profiles; 
in this case, SD effects were more heavily 
weighted than those attributable to IP. A 
similar finding occurred with the partial cor- 
relation analyses. Control of variance in Ma 
profiles did not reduce correlations of N group 
profiles with SD values to the degree found 
with all other comparable partial correlations 
with IP held constant. This discrepancy was 
accounted for above in relation to-the findings 
with partial correlations. 

The fourth analysis consisted of comparing 
the relative magnitudes of residuals derived 
from the following statistical considerations: 
Given a multiple correlation (R012) which 
is based upon two predictors (1 and 2) which 
are themselves intercorrelated, the relative 
contribution of each to the R%o12 can be 
evaluated by comparing (R912 — 7.01) with 
(R*o12 — 17.02). 

Multiple Rs and zero-order coefficients were 
computed from all possible combinations of 
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group profiles as predictors and criterion vari- 
ables in conjunction with SD values as a 
predictor according to the same procedure 
followed to obtain beta weights in the pre- 
ceding analysis. Possible differences among 
residuals attributable to sex differences in 
the groups were evaluated by analysis of vari- 
ance and as was found with the betas, no 
significant main or interactive effects were 
obtained. 

A second simple, randomized, two factor 
analysis of variance was computed with re- 
siduals as criterion measures and item pref- 
erences versus social desirability of items and 
normalcy versus maladaption of groups as 
the two factors. 

The summary of this analysis of variance, 
mean residuals and ¢ tests are shown in Table 
4. Since the beta weights of the analysis 
shown in Table 3 and the residuals employed 
in this analysis were not entirely independent, 
it was to be expected that the results would 
be similar. As may be noted in Table 4, mean 
residuals associated with removal of variance 
attributable to the SD variable were larger 
than mean residuals obtained by subtracting 
variance of common item preferences with 


TABLE 3 


SUMMARY OF ANALYSIS OF VARIANCE AND ¢ TESTS: 
SoguarED Beta WEIGHTS FOR IP anD SD AND 
MALADJUSTED (Ma) VERSUS 
Normat (N) Groups 





























Source df SS MS F B 
Betarp’-Betagp? eS eS ee OLS aes 00 L 
Groups Se O18s2) 0291 9.15 .001 
Cells 10.456 
Beta? by. groups 

interaction 3 6.112 2.037 64.06 .001 
Within cells 172 5.478 0.032 

Total 179 15.934 

Betarp? Betagp” 
Num- Num- 

Groups M ber M ber t p 
N:N 489 30 065 30 5.828 001 
Ma:Ma BOG ame 1605 12 4.73 001 
N:Ma 812 24 036 24 15.07 001 
Ma:N 150 24 369 24 4,25 001 


nn EEE 


s Error term was MS within cells from above: df = 172. 
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TABLE 4 


SUMMARY OF ANALYSIS OF VARIANCE AND ¢ TESTS: PREDICTIVE EFFICIENCY 
oF SD AnD IP aNp MatapyusTED (Ma) versus Norma (N) Groups 











Sources af SS MS F ? 
A (SD versus IP) 1 1101 1101 44.57 001 
B (Ma versus N Gps.) 3 1799 .0600 24.27 001 
AXB 3 5953 1984 80.34 001 
Within cells 172 4250 0025 
Total 179 1.3104 


a a a a ee ee 
Variable subtracted 





IP 
Groups Number M Residuals Number t ? 
eee EEE eee 
N:N 30 .0764 0343 30 3.298 OL 
Ma:Ma 12 2279 .0329 12 9.61 001 
N:Ma 24 1707 0172 24 10.73 001 
Ma:N 24 .0700 .1882 24 8.27 .001 


 — OCOhnwnwnhna —wX¥—nr IN IE 


8 Error term was MS within cells from above; df = 172. 


but one exception: where profiles of Ma 
groups served as predictors of N group pro- 
files, removal of SD variance left a smaller 
mean residual than did subtraction of variance 
attributable to item preferences. Similar re- 
sults obtained with partial rs and beta weights 
with the same combination of groups was dis- 
cussed above. 

These findings show that a significant 
source of variance remained in the multiple 
regression after effects attributable to a gen- 
eral SD variable were removed. In light of 
the findings with all four of the analyses pre- 
sented it is reasonable to infer that this vari- 
ance is primarily attributable to intergroup 
consistencies in preferential responses to 
items. 


DiIscUSSION 


While the SD and IP variables are signifi- 
cantly related, it is also clear that consistent 
communalities in item popularities were found 
in responses of heterogeneous groups and 
that these were independent of a general SD 
variable. The findings also indicate that IP as 
a general factor can account for more variance 
than item SD as defined by Edwards with 
self-report scores. Apparently Ss do discrimi- 
nate content and respond to it differentially 
with a significant degree of consistency as well 


as exhibit stylistic tendencies. It is also note- 
worthy that different item popularities oc- 
curred with very desirable, desirable, neutral, 
undesirable, and very undesirable items and 
that both “maladjusted” and “normal” groups 
manifested these preferences with comparable 
consistency. Nevertheless both similarity of 
groups and situational variables can be im- 
portant determinants of degree of preferential 
consistency as was shown by Waters and 
Wherry (1962). 

Several implications follow from the above 
which are of considerable importance in test 
construction (and test interpretation). With 
binary forced choice item formats such as 
are employed in the Edwards Personal Prefer- 
ence Schedule (1954) the control of SD 
values for item pairs would not be sufficient 
since items with identical SD weights can 
have appreciably different popularities and 
such popularities appear to have generality 
over different subpopulations. 

A further undesirable effect of failure to 
consider IP level has been graphically illus- 
trated by Wiggins (1962), who showed that 
social desirability scales (measures of this R 
style in individuals) with high “communality 
values” possess much poorer screening effi- 
ciency than do scales with lower communality 
values. In the case of a scale with high aver- 
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age communality “. . . even slight shifts in 
the direction of social desirability would ap- 
proach the limit of maximum communality 
[p. 229].” Hanley (1961) also stressed the 
importance of item popularity and concluded 
that “Lack of attention to item content and 
response frequencies are more clearly asso- 
ciated with ineffectiveness (of scales) than 
is the empirical-rational dimension [p. 15].” 

Norman (1963) found a high correlation 
of .97 between indices of item popularity with 
his Descripitve Adjective Inventory (DAI) 
taken by two independent but well matched 
samples of college students; under fake con- 
ditions the correlation was .98. Based on 
these findings that “. . . the proportion of 
persons responding to each item in a given 
direction was extremely stable over different 
samples of persons under each of the two 
conditions taken separately [p. 233],” the 
author developed scoring keys for the detec- 
tion of faking. In addition to other considera- 
tions, Norman selected items with large 
positive and negative shifts in item populari- 
ties under fake and self-report conditions. 
With appropriate keys for such items, Detec- 
tion Scales were developed which in their 
summation form had fake positive and 
fake negative rates of only about 6% with 
cross validational samples responding under 
“straight-take” and ‘“fake-take” conditions. 
In essence, this was accomplished by select- 
ing items on the basis of their empirical dif- 
ferences in popularities under two test-taking 
sets. In this approach, item SD values were 
not considered, yet the discriminatory power 
of the resulting scale proved to be very high 
when like settings and respondents were in- 
volved. 

Crowne and Marlow (1960) expressed con- 
cern over Edwards’ focus on a presumably 
unitary dimension like social desirability in 
developing his SD scale lest “. . . items may 
also be characterized by their content which, 
in a general sense, has pathological implica- 
tions [p. 349].” Their preference was to em- 
ploy wiat amounted to both the social de- 
sirability and item popularity variables in 
developing a “new social desirability scale.” 
Their strategy was to select items from a 
population of “. . . behaviors which are cul- 
turally sanctioned and approved but which 
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are improbable of occurrence [p. 350].” This 
approach exemplifies a possible application 
of social desirability, item popularity and con- 
tent variables in test development. 

This investigation provides evidence that 
item popularity is an important source of 
independent variation in self-report inven- 
tory scores. The findings of other studies 
cited suggest that if item popularity were 
systematically considered in scale construc- 
tion, in addition to other variables, screening 
efficiency could be appreciably enhanced, 
powerfully discriminative scales could be de- 
veloped via study of shifts in item populari- 
ties under different test-taking sets, and a 
response style such as social desirability could 
be measured with minimal contamination from 
preferential responding of Ss to specific types 
of item content. 
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DETERMINANTS OF WORK ATTITUDES 
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The motivator-hygiene theory of work attitudes assumes 2 independent sets 
of variables (motivator and hygiene) important to employee job satisfaction 
and dissatisfaction. The applicability of this assumption to the job attitudes 
of 117 blue-collar workers was determined through factor analyses of a 40- 
item work attitude survey. The job attitudes of blue-collar workers could be 
separated into 2 relatively independent sets of variables, comparable to moti- 
vator or hygiene variables. However both sets of variables were found to be 
positively related to job satisfaction, contrary to predictions from the theory. 


A number of investigators (Gurin, 1963; 
Gurin, Veroff, & Feld, 1960; Hoffman & 
Mann, 1956; Kornhauser, 1962; and Pelz, 
1957) have been concerned with the exami- 
nation of underlying sources of job satisfac- 
tion and dissatisfaction that appear to influ- 
ence the worker in his work environment. 
Recent conceptions of job satisfaction have 
assumed two general classes of variables im- 
portant to employee work attitudes: those 
variables in the work process which facilitate 
the personal growth and development of the 
individual, and environmental variables which 
include physical and monetary rewards. 

This dichotomous approach to the analysis 
of work attitudes has not been based upon 
a clear delineation between the two classes 
of job variables. Generally, differences in 
motivation and morale between groups of 
employees were observed, and then efforts 
were made to identify their determinants in 
terms of both extrinsic pressures which were 
introduced from the external environment and 
intrinsic pressures which appeared to act 
from within the individual. 

Although this intrinsic-extrinsic dichotomy 
has not been empirically established, Herz- 
berg, Mausner, and Snyderman (1959) have 
used this dichotomy to relate certain job vari- 


1 This paper is based in part upon a thesis com- 
pleted by the first author under the direction of 
the second author, and submitted to the University 
of Florida in partial fulfillment of the requirements 
of the Master degree. The authors acknowledge the 
assistance of Harry E. Anderson, Jr. in the statis- 
tical analyses. This research was supported in part 
by research grant RD-1127 from the Vocational Re- 
habilitation Administration, Department of Health, 
Education, and Welfare, Washington, D. C. 
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ables to work attitudes. Herzberg et al. have 
conceptualized job satisfaction as consisting 
of two separate independent dimensions—the 
first dimension was related to job satisfaction 
and the second to job dissatisfaction. An im- 
portant feature of their approach was the im- 
plication that these dimensions were not 
opposite ends of the same continuum, but 
instead represented two distinct continua. 
This theory was generalized from data ob- 
tained by examining the content of employee 
reports of critical incidents associated with 
extreme job satisfaction or dissatisfaction. 
Two general sets of variables emerged. 
When dissatisfied, employees reported situa- 
tions which were characterized by poor 
company policies, unfair salary schedules, in- 
adequate supervision, poor interpersonal re- 
lations, and faulty working conditions. These 
responses referred primarily to the environ- 
ment or context in which the job was per- 
formed and were called “hygiene variables.” 
The analysis of work attitudes in which 
job satisfaction was reported revealed a dif- 
ferent set of variables. These included: task 
responsibility, achievement, advancement, rec- 


ognition, and the nature of the work itself. 


Because of their association with positive 
work attitudes, they were called “motivator 
variables.” 

A major implication of Herzberg’s moti- 
vator-hygiene theory of work attitudes is that 
there are two separate sets of attitudinal vari- 
ables. One set of variables (motivator) leads 
to high job satisfaction but does not con- 
tribute in any appreciable degree to dis- 
satisfaction, while another set of variables 
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hygiene) leads to job dissatisfaction but 
contributes little to satisfaction. 

A number of studies have been completed 
ym a variety of populations, suggesting that 
he motivator-hygiene theory of work atti- 
udes has a considerable degree of general 
:pplicability. These two components of job 
atisfaction were found in research with 
niddle-management personnel (Schwarz, 
1959), supervisory and nonsupervisory indus- 
rial workers (Gibson, 1961), physical reha- 
ilitation patients (Fantz, 1962), and schizo- 
yhrenic patients (Hamlin & Nemo, 1962). In 
uddition, Friedlander (1964) compared em- 
yloyee responses to various job characteristics 
us sources of satisfaction with these same 
sharacteristics as sources of dissatisfaction. He 
‘ound that for any one job characteristic 
(taken singly), its importance as a satisfier 
vas unrelated to its importance as a dissatis- 
jer. Thus the satisfier and the dissatisfier (for 
any one characteristic, ie. working condi- 
ons) were unrelated. 

However, other researchers? (Ewen, 1964; 
Wernimont & Dunnette, 1964) have found 
significant positive correlations between mo- 
‘ivator and hygiene variables, whereas inde- 
yendence of these variables was predicted by 
‘the motivator-hygiene theory. In addition, 
some of the variables which Herzberg believed 
‘o act as only satisfiers or as only dissatisfiers 
were related to both job satisfaction and dis- 
satisfaction in Ewen’s study. Friedlander 
1963) also found that both motivator and 
aygiene variables were associated with job 
satisfaction. He factor analyzed the correla- 
‘ions among each of 17 items reflecting the 
mportance of various job characteristics to 
employee satisfaction. Three meaningful fac- 
sors emerged. Factor I—Social and Technical 
“nvironment, and Factor I[—Intrinsic Self- 
Actualizing Work Aspects, corresponded in 
»art with Herzberg’s concept of hygiene and 
motivator variables; however, Factor ITI— 
Recognition through Advancement, was com- 
yosed of both motivator and hygiene type 
rariables. 

In view of the disparity and inconclusive- 
‘ess of the preceding studies, it is apparent 


2J. Block and H. Yuker, Unpublished report, 
anuary 1963. 
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that the nature of satisfiers and dissatisfiers 
(if in fact such factors do exist) regarding 
work attitudes is far from clear. The purpose 
of this study was to examine the assumptions 
underlying the motivator-hygiene theory of 
work attitudes. Specifically, this study was 
concerned with the analysis of the elements 
within the work environment which contribute 
to job satisfied and dissatisfied work attitudes. 

An important variation in the present re- 
search involves the study of blue-collar work- 
ers. Previous research directly related to the 
motivator-hygiene theory has been focused 
primarily upon the work attitudes of profes- 
sional and white-collar workers rather than 
on blue-collar semiskilled and unskilled work- 
ers. The attitudes of this enormous group of 
workers must be investigated before any gen- 
eral theory of work attitudes is evolved. 


METHOD 


A 40-item work attitude survey was designed to 
obtain a measure of the variables related to em- 
ployee work attitudes. The Work Attitude Survey 
(WAS) was developed as part of the present study 
and consisted of 20-motivator and 20-hygiene items 
expressed in a Likert-type 5-point rating scale. The 
scale was developed from a larger pool of items 
with the aid of judges and repeated pretests to pick 
“cood” items. There were four items representing 
each one of the 10 work attitude variables taken 
from Herzberg’s motivator-hygiene theory of job 
satisfaction. 

The survey was distributed by the investigators to 
270 white, male blue-collar workers in June 1964. 
One hundred seventeen questionnaires (43%) were 
completed and returned anonymously. These 117 
workers comprised the sample for the present study. 
All of the 270 workers were employed in ground 
crews (maintenance and watchmen) in a large 
Southern State University. The age of the workers 
who completed the survey ranged from 21 years to 
68 years, with a mean age of 41.4 years. The yearly 
base salary ranged from $2,700 to $5,900, with a 
mean salary of $3,700. The length of employment 
with the present employer ranged from 1 month 
to 27 years, with a mean length of employment of 
5.5 years. 

Pearson product-moment correlation coefficients 
were computed among the 40 motivator-hygiene 
items. The resultant 40 X 40 matrix was examined 
to determine the number of correlations, significant 
at the .05 level of confidence, between motivator- 
motivator, hygiene-hygiene, and motivator and hy- 
giene items. 

The intercorrelation matrix was factor analyzed 
by the principal-factor method and machine-rotated 
by the Varimax routine. Unities were used in the 
diagonals of the intercorrelation matrix rather than 
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the squared multiple-correlation coefficients (R?), 
since the focus of the study was upon the relative 
magnitude of the loadings and the relative impor- 
tance of the factors. While the use of unities may 
have resulted in slightly higher factor loadings, the 
relationships with which the study was concerned 
would be unchanged by the use of R?. 

Factor scores then were obtained from this anal- 
ysis for the 117 Ss by the formula, in matrix form 
F=S’Z where, for p variables, N persons, and k 
factors, F is a k XN matrix of factor scores (one 
column per S), S’ is the transposed k X p matrix of 
factor loadings, and Z is the  X N matrix of stand- 
ardized scores. This formula is a modified form of 
Harman’s (1960, p. 340) formula 16.8. The factor 
scores were intercorrelated and a second-order factor 
analysis was made of the resultant factor score 
matrix. 

A set of four questions which tapped the respond- 
ent’s satisfaction with his job were combined into 
one overall satisfaction score. These overall satisfac- 
tion scores were correlated with factor scores based 
upon the second-order factor analysis. These factor 
scores were derived in the same way as those from 
the first-order factor analysis. The two factors were 
expected to correlate positively and negatively, re- 
spectively, with the overall satisfaction score. 


RESULTS 


A number of statistically significant corre- 
lations (p < .05) between motivator-motiva- 
tor, hygiene-hygiene, and motivator and hy- 
giene items were found. One hundred fourteen 
(60%) of the 190 correlations between mo- 


TABLE 1 


Two COMPREHENSIVE WoRK ATTITUDE FACTORS 











Second 

order Factor 
factors Factor Factor name loading* 

1 XI> (not interpreted) 88 

VIII Unrecognized work efforts 79 

V_ Individual accomplishment 79 

X> (not interpreted) 76 

I Salary 71 

XII (not interpreted) 67 

IV Advancement 62 

VI Work role dissatisfaction AA 

1a IX Work frustration —89 

VIL Physical work environment —84 

II Interpersonal relations —79 

II Technical supervision —75 

XII» (not interpreted) —60 

I Salary —55 

V_ Individual accomplishment —49 


® Decimals omitted from factor loadings. 
b Factor considered uninterpretable. 
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tivator and motivator items were positive 
and statistically significant. One hundred 
eighty-one (95%) of the 190 correlations be- 
tween hygiene and hygiene items were posi- 
tive and statistically significant. In contrast, 
only 113 (28%) of the 400 correlations be- 
tween motivator and hygiene items were posi- 
tive and statistically significant. These find- 
ings provide some support for Herzberg’s as- 


sumption that motivator and hygiene items — 


represent separate, independent dimensions of 
work attitude variables. 

Twelve factors were extracted from the 
first-order factor analysis. All items with load- 
ings above .40 on a factor were considered 
as making up that factor. Four “pure” fac- 
tors, three hygiene—Factors I, II, and III, 
and one motivator—Factor IV, were identi- 
fied as follows: Factor I—Salary; Factor II 


—Technical Supervision; Factor I]J—Inter- | 


personal Relations; and Factor [V—Advance- 
ment. By a “pure” factor is meant that the 


items making up the factor all represent a — 


single variable or job element. For example, 
Factor I was composed of the 4 salary items 
in the scale. No items representing other vari- 


ables loaded on this factor. 


Two other factors, composed of different 
motivator items but uncontaminated by hy- 
giene items, were identified as follows: Fac- 
tor V—Individual Accomplishment—included 
items which reflected work attitudes based 
on the ability to handle work assignments 


without supervision, and Factor VI—Work ~ 


Role Dissatisfaction. Items in Factor VI, in 
general, expressed negative reactions to par- 


ticular employee work -roles. These findings — 


are consistent with the hypothesis that cer- 
tain work attitudes are based on relatively 
independent aspects of the total work en- 


vironment (such as salary increases, promo- — 


tions, etc.). 
The remaining factors were found to be 


composed of both motivator and hygiene © 
items. From this group, three meaningful © 


factors were identified. Factor VII—Physical — 


Work Environment—included items con- 


cerned primarily with the total work environ- — 


ment and appeared to be of an impersonal — 


nature, directed toward the environment in 


which the workers performed their tasks. — 
Factor VIII—Unrecognized Work Efforts— 
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fas composed of items which reflected lack 
f recognition for employee work efforts. Fac- 
or IX—Work Frustration—included items 
thich reflected a negative response to varying 
spects of the total work environment. Fac- 
ors VII, VIII, and IX expressed employee 
eactions to both motivator and hygiene as- 
ects of the work setting. No meaningful 
iterpretation could be applied to Factors X, 
I, and XII. 

Two comprehensive work attitude factors 
rere obtained from the second-order factor 
nalysis. These second-order factors are pre- 
ented in Table 1. All first-order factors with 
yadings above .40 were considered as mak- 
ag up the second-order factors. Factor I’ was 
n Intrinsic Work Environment factor deter- 
uined for the most part (with the exception 
f Factor I—Salary) by factors which re- 
ected Herzberg’s motivator variables, that 
;, employee reactions to intrinsic or moti- 
ator elements in the work environment. Fac- 
or II’ was. identified as an Extrinsic Work 
vironment factor determined primarily by 
actors which included employee reactions to 
xtrinsic or hygiene elements in the work en- 
ironment. Factors V and XII were composed 
xclusively of motivator items and were in- 
luded both in Factors I’ and II’; however, 
oth first order factors had higher loadings 
n Factor I’ than Factor II’. The composi- 
on of Factors I’ and II’ for the most part 
; consistent with Herzberg’s concept of mo- 
‘vator and hygiene variables; however, spe- 
ific factor loadings contained in both second- 
rder factors reflected a relationship between 
iotivator and hygiene variables not consist- 
at with the motivator-hygiene theory. Two 
ther second-order factors were obtained. No 
seaningful interpretation could be applied 
)» Factor III’, and Factor IV’ was identified 
3 a residual factor. The total variance of 
ae system was equal to 9.9456. Factors I’ 
nd II’ reflected a variance equal to 8.1884. 
‘herefore, the two factors accounted for ap- 
roximately 82% of the predictable variance 
i the system. 

A positive correlation of .51 was found 
etween factor scores reflecting Factor I’ 
motivator) and overall job satisfaction 
‘ores, as was predicted from the motivator- 
ygiene theory. However, the negative cor- 
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relation predicted between Factor II’ (hy- 
giene) factor scores and job satisfaction was 
not found. This coefficient was found to be 
plus .55; factor scores representing both fac- 
tors were positively associated with job 
satisfaction. 


DiscussIoN 


The results of the present study only par- 
tially support Herzberg’s two factor theory 
of job satisfaction. Empirical evidence was 
offered that certain job elements appeared 
together more frequently than others. Fac- 
tors I’ and II’ coincide, for the most part, 
with Herzberg’s concept of motivator and 
hygiene variables. In addition, Factors I, I, 
III, and Factors IV, V, and VI were identi- 
fied as related to specific hygiene and moti- 
vator aspects of the work environment. These 
findings are consistent with the results ob- 
tained by Schwarz (1959), Gibson (1961), 
and Friedlander (1964). However, Factors 
VII, VIII, and IX, and Factors I’ and II’ 
were composed of both motivator and hy- 
giene items. These results most clearly cor- 
respond with those found by Ewen (1964), 
Friedlander (1963), and Wernimont and Dun- 
nette (1964). These investigators also found 
that motivator and hygiene variables were re- 
lated in a similar way to both job satisfaction 
and dissatisfaction. This is consistent with 
findings from the present study that overall 
job satisfaction among blue-collar workers 
was positively related to both motivator and 
hygiene variables. The results of the present 
study appear to indicate that the underlying 
structure of job attitudes of satisfaction and 
dissatisfaction is more complex than Herz- 
berg’s two factor theory of work adjustment 
suggests. The present findings suggest that the 
motivator-hygiene theory provides only a par- 
tial description of those elements in the total 
work environment which affect the job atti- 
tudes of unskilled and semiskilled blue-collar 
workers. 

There are several possible explanations for 
the present findings. It has been pointed out 
by other investigators (Ewen, 1964; Fried- 
lander, 1963) that the motivator-hygiene the- 
ory may involve too rigid a classification of 
those elements in the work process which con- 
tribute to employee satisfaction. The positive 
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correlations found between motivator and hy- 
giene items were not totally unexpected. 
Block*® pointed out that motivator and hy- 
giene variables can be positively related, 
negatively related, or unrelated depending on 
many factors. He felt that if motivator varia- 
bles were related primarily to the performance 
of a task, while hygiene variables were re- 
lated primarily to the setting in which the 
task was being performed, it would be ex- 
pected that a person could respond unfavora- 
bly to both the job activity and the setting, 
and vice versa. 

In contrast to Herzberg’s formulations, it 
is reasonable to assume that the main determi- 
nants of job satisfaction are not distributed 
along separate dimensions, but interact in a 
variety of ways. It is difficult to conceive of 
situations in which positive work attitudes 
are not generally accompanied by increased 
responsibility and challenging work assign- 
ments (motivator variables) as well as more 
tangible evidences of recognition, such as in- 
creased salary or better working conditions 
(hygiene variables). These observations were 
supported by the findings of Ewen (1964), 
Friedlander (1963), and Wernimont and 
Dunnette (1964), who found that motivator 
and hygiene variables were not mutually ex- 
clusive determinants of job satisfied and dis- 
satisfied employee work attitudes. 

The motivator-hygiene theory has been 
used to describe the job attitudes of workers 
from a number of different occupational lev- 
els. The occupational level variable appeared 
critical to the present findings. Herzberg’s 
original formulations were based on an anal- 
ysis of the work attitudes of accountants and 
engineers. These higher level occupational 
groups tended to place prime importance on 
motivator aspects of the work setting which 
afforded opportunities for personal growth. 
These employees were most concerned with 
promotions, challenging work assignments, 
and the type of work they were doing. 

However, the present study focused on the 
attitudes of semiskilled and unskilled blue- 
collar workers. In general, these employees 
had low salaries, had not attained a very high 
level of education, and had experienced rela- 


3 J. Block, Personal communication, 1963. 
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tively slow advancement in their organiza- 
tions. They frequently had a strong need for 
good supervision and group relations, and 
placed importance on being able to rely on 
others. Probably these workers were very 
concerned about and dependent upon hygiene 
elements in their work environment (salary, 
good working conditions, etc.). In general, 
blue-collar workers probably are more pre- 
occupied with fulfilling basic needs than are 
workers in higher occupational levels. Only 
after these basic needs are satisfied do they 
become interested in the personal growth 
aspects in the work environment. 

On the other hand, among higher level 
occupational groups, these basic needs usually 
are already satisfied, and more energy of 
these employees can be devoted to those as- 
pects of the work setting (motivator varia- 
bles) which contribute to the personal de- 
velopment of the individual. Work experience 
has led these individuals to expect high sala- 
ries and comfortable working conditions, so 
that these job elements are no longer the 
primary determinants of satisfied work atti- 
tudes. The possibility that blue-collar workers 
tend to respond to both hygiene and moti- 
vator variables, while higher level occupational 
groups tend to respond primarily to motivator 
variables, may account for the less than com- 
plete independence found between motivator 
and hygiene sources of job satisfaction in the 
present study. 

It is possible, also, that the relationships 
between items loading on a particular factor 
could have been due to a response set effect. 
Perhaps the correlations found between moti- 
vator and hygiene items resulted from the 
tendency of workers to respond in the same 
manner to like-worded statements. At least 
two other investigators have commented on 
the effects of response sets related to the 
motivator-hygiene theory. Block (see Foot- 
note 3) has noted a tendency in evening class" 
college students to react to motivator and 
hygiene items with characteristic response 
sets. Gould (1963) also observed that nursing 
students chose motivator items more fre- 
quently than hygiene items, probably because 
the latter were usually stated in a negative 
fashion. 

The present exploratory study is based 


DETERMINANTS OF WorK ATTITUDES 


upon the work attitudes of unskilled and 
semiskilled blue-collar workers in a university 
setting. This study indicated a tentative basis 
for generalizing Herzberg’s results beyond the 
sriginal white-collar occupational level in 
which they were obtained. It can be concluded 
that there are two general sets of work atti- 
tude determinants, identified as intrinsic and 
extrinsic, or motivator and hygiene, which 
contribute to the job attitudes of blue-collar 
workers. However, these two sets of work 
attitude variables are not totally independent 
of each other at least when obtained from 
blue-collar workers. The hypothesis that mo- 
tivator variables are most important to job 
satisfaction while hygiene variables are most 
important to job dissatisfaction was not sup- 
ported in the present study. 
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FAKING OF A SCORED LIFE HISTORY BLANK AS A 
FUNCTION OF CRITERION OBJECTIVITY ’ 


STEPHEN P. KLEIN 2 


Educational Testing Service 


AND WILLIAM A. OWENS, JR. 


Purdue University 


This study investigated the role of the criterion in determining the trans- 
parency of a scoring key. When college students (N =55) were instructed to 
fake a life history questionnaire, they were able to do so, but a scoring key 
empirically derived from a subjective criterion of creativity (ratings) was 
more transparent than one based upon an objective criterion (patent dis- 
closures). It also appeared that prior exposure to the questionnaire facilitated 
faking on the former key more than on the latter, Also investigated was the 
possibility that the difference in transparency was attributable to the sub- 
jective key having been biased by a fallacious stereotype. To evaluate this 
possibility, recruiting interviewers (N=79) filled out the questionnaire as 
they thought a creative scientist would. As expected, they selected a greater 
proportion of the item choices thought to be associated with creativity than of 


those actually demonstrated to be correlated with creative output. 


Two reasons for the current popularity of 
biographical questionnaires in industry have 
been (a) that personality types of items may 
be interspersed with factual ones so as to 
make the former more palatable to the ap- 
plicant (Owens, 1963) and (0) that correla- 
tions between information provided by the 
applicant and that obtained from previous 
employers are very high; virtually all r’s 
range from .90 to .99 (Keating, Paterson, & 
Stone, 1950; Mosel & Cozan, 1952). This is 
very impressive accuracy of reporting, par- 
ticularly when it is compared with that gen- 
erally elicited by personality inventories in 
industry. 

It was certainly implied by Mosel and 
Cozan (1952), however, that applicant hon- 
esty is a more or less direct function of the 
extent to which responses are verifiable. Per- 
sonality items, irrespective of their format, 
are notoriously nonverifiable. Thus, if there is 
to be continued use of such items in applica- 
tion forms, more attention should be paid to 
minimizing their transparency. 

Previous attempts to reduce faking have 
been largely devoted to developing special 


1 This article is based on a master’s thesis by Klein 
under the direction of Owens. The authors wish 
to express their appreciation to the Purdue Research 
Foundation for their support of this study and to 
B. J. Winer for his help in analyzing the data. 

“Formerly at Purdue University, Lafayette, In- 
diana. 
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control keys or to equating options for social 
desirability. Although several investigators 
have noted that objective criteria are gener- 
ally superior to ratings, because of the rogues’ 
gallery of limitations and biases attached to 
the latter (Guilford, 1954), little attention 
has apparently been paid to whether this ad- 
vantage would also result in a less fakable 
scoring key. One indication that the type of 
criterion might affect transparency has come 


from the work of Smith, Albright, Glennon, — 


and Owens (1961). They noted that a preva- 


lent and false stereotype of the creative scien- — 


tist may have reduced the correlation between 
a scoring key derived from supervisory ratings 


and one based upon an objective measure of — 
creative output. Smith et al.’s observations led — 
the present authors to hypothesize that under — 


such circumstances the superiority of objec- 


tive criteria would again show itself by yield- | 


ing a less fakable key; that is, since a stereo- 


type is founded on subjective judgments, it 


should be more closely related to the key 


based on ratings. 

The present study was undertaken to evalu- 
ate this hypothesis by testing whether (a) it 
would be easier to fake the item options re- 
lated to a criterion of ratings than it would 
be to ferret out the choices which in fact are 
correlated with performance and (6) whether 
a stereotype is more closely related to a sub- 
jectively based criterion key or to an objec- 
tive criterion key. 
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TABLE 1 


TRANSPARENCY INDICES AND EXPECTANCY TABLE PREDICTIONS UNDER 
HoNEsT AND FAKE CONDITIONS 





Scoring key 











Creativity ratings Patent disclosures 
Condition Honest Fake Honest Fake 
aC —0.38 4.83 2.60 5.05 
SD 3.16 3.50 1.70 1.50 
Transparency index 2.94 1.20 3.80 2.07 
t 6.960** p02 aa 
Chances in 100 of being 
successful 31 65 18 26 
*y <.01. a 
METHOD RESULTS 


Questionnaire. It was decided to utilize a question- 
ire developed to predict research creativity (Smith 
al., 1961) since one was available which afforded 
pirically derived scoring keys for both subjective 
d objective criteria (ratings versus patent dis- 
sures, respectively). On _ cross-validation, both 
ys had previously exhibited the same concurrent 
lidity, 52, but had intercorrelated only .28. 
Subjects. Groups 1 (N=40) and 2 (N=15) 
re composed of seniors at a large midwestern 
iversity. Group 3 (N=79) consisted of recruit- 
‘ interviewers of doctoral candidates in the same 
ool. 

Procedures. Group 1 responded to the question- 
ire first under instructions designed to elicit 
nest responses and again a week later under 
tructions to answer as if they were “applying for 
job for which the employer wanted a truly 
ative scientist.” Group 2 received only the “fake” 
tructions, Group 3, the interviewers, were di- 
ted to fill out the questionnaire as they thought 
“typical, creative research scientist would.” This 
s done to evaluate the hypothesis that the 
reotype of the creative scientist was more closely 
ated to the subjective criterion key than to the 
jective criterion key. 

Statistical analysis. Since raw scores on _ the 
les were not comparable, a transparency index 
ilar to a standard score was developed. This 
lex (the difference between the total possible score 
1 the obtained score, expressed in standard 
yiation units) is interpreted as follows: “The 
aller the index, the more sensitive the group is 
the item choices which will give them the ‘better’ 
ire, i.e., make them appear more creative.” Shifts 
score were evaluated via both ¢ tests on these 
lices and expectancy table probabilities of an 
lividual being considered successful. These prob- 
lities were used to assess the impact of faking 
on typical hiring decisions and thereby provide 
ne measure of the practical difference in trans- 
rency between the two keys. 


Both keys were somewhat fakable, but the 
key based upon the subjective criterion (su- 
pervisory ratings of creativity) appeared to 
be more transparent (Table 1). This conclu- 
sion is supported by the magnitude of the ?’s 
and by the expectancy table outcomes. 

As may be seen from Table 2, Group 1’s 
experience of taking the questionnaire under 
honest conditions facilitated their ability to 
fake both keys, but this previous exposure had 
a more pronounced effect upon the key based 
upon ratings. 

Table 3 shows that interviewers gave sig- 
nificantly more responses overlapping the op- 
tions of the subjective criterion key than of 
the objective criterion key. Thus, they were 


TABLE 2 


MEANS FOR GRouPS 1 AND 2 FOR THE FAKE CONDITION 





Scoring key 


Creativity Patent 


Group ratings disclosures 

I (N = 40) 
x 4.83 5.05 
SD 3.50 1.50 

II (WV = 15) 
xX 2.47 4.23 
SD 3.70 1.40 
t 2.14** 1.86* 

ie ae eine : 
*E > < 01. 
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TABLE 3 


INTERVIEWERS’ SCORES ON THE Two Keys 





Scoring key 








Creativity Patent 
ratings disclosures 
Xx 5.63 4.49 
SD 3.20 1.80 
Transparency index 1.05 2.50 
t 07a" 
** > <.01. 
selecting greater proportions of more fakable 


item choices and were exhibiting a preference 
for those associated with ratings of creativity 
as opposed to those actually correlating with 
creative output. 


DISCUSSION 


The preceding results support the hypothe- 
sis that an objective criterion yields a less 
fakable scoring key than one based upon the 
subjective criterion. In this instance, the type 
of criterion employed conditioned not only 
a statistically significant difference in trans- 
parency, but a practically significant differ- 
ence as well. That is, when the obtained 
means were treated as scores and read out as 
probabilities of success, it appeared that only 
with the rating key would a hypothetical hir- 
ing decision change under the two conditions. 

Evidence for the superiority of the objec- 
tive criterion key also came from the differ- 
ential influence of prior exposure to the ques- 
tionnaire upon its transparency. Although 
Group 2 appeared to have done a particularly 
good job of faking the rating key and had 
left little room for additional insight into it, 
Group 1’s experience of taking the question- 
naire under honest conditions facilitated their 
ability to fake that key even more than it did 
the patent disclosure key! Further, this dif- 
ference in transparency was not due to the 
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rating key having had more verifiable and/or 
socially desirable item choices, since both keys 
had equivalent proportions of such options 
(Klein, 1963). 

It is believed that the greater transparency 
of the rating key was due to the influence of 
a fallacious stereotype of the creative scien- 
tist in its derivation. The data obtained from 
the interviewers clearly supports the hypothe- 
sis that their perception of the ‘“‘creative scien- 
tist” would coincide more closely with the ap- 
pearance of creativity, as measured by ratings, 
than with the hallmarks of actual creative 
output. 

From all of the foregoing, the authors con- 
clude: (a) That nonverifiable items probably 
are not answered with the veracity classically 
attributed to responses on the application 
blank. (6) That selecting and/or weighting 
items in such inventories on the basis of their 
relationship to a subjective criterion may 
cause them to be relatively transparent. (c) 
That interviewers should distrust their un- 
aided abilities to distinguish occupational 
stereotypes from characteristics actually rele- 
vant to job performance. 
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RELATIONSHIPS BETWEEN GENERAL AND SPECIFIC 
ATTITUDES TOWARD WORK AND OBJECTIVE JOB 
PERFORMANCE FOR OUTDOOR ADVERTISING 
SALESMEN 


WAYNE K. KIRCHNER 


Minnesota Mining and Manufacturing Company, St. Paul, Minnesota 


Product-moment correlations were computed between and among 10 scales 
measuring general and specific work attitudes and 2 objective, numerical 
measures of sales performance for 72 outdoor advertising salesmen who com- 
pleted a 100-item attitude questionnaire. General work attitudes were positively 
related (r= .42, 46) to objective sales performance. In addition, attitudes 
toward supervision were strongly related to general work attitudes and other 
work aspects, suggesting that the supervisor really did represent the company 
to these salesmen who were on highly isolated jobs. Of some interest was the 
general low relationship between attitudes toward compensation and benefits 
to other attitudes and to actual job performance. 


Relationships between job satisfaction and 
b performance have been widely researched 
y industrial psychologists over the last 40 
ears. Brayfield and Crockett (1955) have 
viewed much of this literature as have 
erzberg, Mausner, Peterson, and Capwell 
1957). In general, common conclusions 
ached by these reviewers have been that the 
ethodology in studying possible relation- 
\ips between job attitudes and job perform- 
ice greatly leaves something to be desired 
id that the whole problem is a very complex 
1e. 
Brayfield and Crockett, too, tended to con- 
ude that job satisfaction or favorable job 
‘titude was not necessarily related to good 
erformance on the job. Herzberg’s review 
nded to support the idea that positive atti- 
ides toward the job were related to better 
n-the-job performance. It is obvious from 
1ese that conflicting results have occurred 
1 studies. 

In any case, most of these studies have been 
one on nonsales personnel, such as produc- 
on workers, office workers, machine tenders, 
nd the like. The sales area then has been 
ymewhat neglected in terms of relationships 
etween job attitude and output. Habbe 
1947) mailed out job satisfaction question- 
aires which were completed by over 9,000 
surance agents. These agents indicated 
nonymously how great their sales volume 
ad been for the year. He found an insignifi- 


cant relationship between job satisfaction and 
so-called production on the job in this case. 
Baxter, Taafe, and Hughes (1953) reported 
a study also with life insurance agents which 
showed a positive relationship between job 
satisfaction on the part of the agents and 
supervisor’s ratings of their job performance. 
While these are not the only studies, of course, 
that have been done in the sales area relating 
job attitudes to job performance, others are 
hard to find. As a result, this brief paper will 
point out the results of a study of an attitu- 
dinal nature conducted with outdoor adver- 
tising salesmen working in all parts of the 
United States. Specifically, this study investi- 
gated relationships between general attitudes 
toward work and objective measures of job 
performance. In addition, relationships be- 
tween general attitudes toward work and spe- 
cific attitudes toward certain areas related to 
the sales job were also considered. 


MetTHop 


A group of 91 salesmen working for a national 
outdoor advertising subsidiary of a large midwestern 
manufacturing concern were asked to complete an 
attitude questionnaire that was mailed to their 
homes. This attitude questionnaire contained 100 
items related to 10 separate areas of work: 


. General attitude toward work 
. Supervision 

. Attitude toward company 

. Compensation 

. Chance for advancement 


unr Whe 
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TABLE 1 
PRropucT-MOMENT CORRELATIONS BETWEEN GENERAL AND SPECIFIC JoB ATTITUDES AND JoB 


PERFORMANCE FOR ADVERTISING SALESMEN (N = 72) 











Chance 


Attitude 
toward 
company 


Total 
points 


Sales 


points 


Working 
conditions 


Communi- 
cations 


Fellow 
employees Benefits 


Train- 
ing 


for ad- 
vancement 


Compen- 
sation 


Super- 
vision 


attitude 


General 








46 


42 
-29 
18 
17 
19 


ae 


-42 


34 
soe 
38 
22 
-62 


.09 


-16 


oo 
58 
54 
12 
24 


sol 


+29 
.10 


55 
.03 


-68 


48 


General attitude 
Supervision 


coe 


44 
39 


67 


A8 


19 
ll 
14 
-26 
-22 
-O7 
-10 


49 


med 


702 


-68 


55 
.29 
tp 
.33 
23 
.09 


34 


Attitude to company 


Compensation 


ae 


30 
—.05 


18 


-03 
-62 


10 
.67 
58 


44 


54 
A3 
ask 
els 


35 


row 
35 


eo 


Chance for advancement 


Training 


-26 
44 
—.06 


31 
=e 


.24 


32 
—.05 


12 
my. 


30 


54 
i] 


18 
.07 
Ao: 
30 


1D 
31 


Fellow employees 


Benefits 


—.12 


-16 
52 
57 


29 


—.06 


44 
<5 
18 
.22 


.26 
43 
Tew 


-62 


N 
N 


38 


Communications 
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“38 


ots 
-07 


54 
19 


14 


ay 


49 
18 
19 


-42 
-42 
-46 


Working conditions 


Sales points 


.86 


.30 


22 


1S 
10 


me 


-86 


-O7 


-26 


«32 


Total points 





Note.—An r of .24<5%, anr of 31lisp <1%. 


. Training 

. Fellow employees 

. Benefits 

. Communications 

. Working conditions 
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Every item was scorable on a Likert-type con- 
tinuum so that it was possible to obtain a numerical 
score for each item with a total score available 
for each particular area. Of the 91 salesmen who 
received questionnaires, 72 (79.1%) returned them. 
A comparison of these respondents and non- 
respondents (Kirchner & Mousley, 1963) showed 
that respondents tended to be much higher producers 
on the job. The implication of this result for the 
present study is that the bottom group of job 
performers may not be represented adequately. 
From the literature reviewed by Kirchner and 
Mousley, it seems fairly certain that nonrespondents 
and nonvolunteers are also more likely to be 
negative in behavioral aspects. It is suspected, there- 
fore, that the lower range of job attitudes (un- 
favorable) was somewhat restricted making it 
harder, of course, to show a relationship between 
attitude and performance for the remaining group. 

In any case, items related to specific areas of work 
such as supervision and compensation were de- 
veloped by the author while items related to gen- 
eral attitude toward work were those developed 
by Brayfield and Rothe (1951). Their 18-item scale 
was incorporated into the total attitude question- 
naire. 

It was possible also to obtain two objective, 
numerical measures of job performance, net-sales 
points and net-total points. Sales points were 
based on the actual number of signs and related 
type materials that were sold. Total points included 
sales points, plus certain other aspects of the job 
for which credit was obtained such as obtaining 
property leases for signs and collecting delinquent 
accounts. Compensation was based to an extreme 
degree on these point totals accumulated each month. 
Each man had a base or quota number of points, 
which, when exceeded, resulted in extra compensa- 
tion. Divisional managers, while not completely 
unbiased, felt these points data were highly ac- 
curate indicators of performance on the job. 

Finally, with attitude scores available for the 10 
scales plus these objective performance data, simple 
correlational analysis was made of the possible - 
interrelationships. 


RESULTS 


The product-moment correlation coefficients — 
for this study are indicated in Table 1. In 
general, the table indicates the following: 

1. The relationship of general job attitude 
to job performance—The greatest relationship 
between any attitude area and actual job per- 
formance is that between general attitudes to- 
ward work and sales points (r = .42) and 
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total points produced (7 = .46). Persons with 
more favorable general work attitudes are 
better salesmen, or, conversely, persons who 
obtain more objective points on the job, had 
more favorable work attitudes. In this study, 
at least, favorable job performance and favor- 
able job attitudes tended to go together quite 
positively. 

Two other specific attitude areas were 
particularly related to job performance. Fa- 
vorable attitudes toward supervision and to- 
ward working conditions were associated with 
better job performance. 

2. Interrelationship of attitude areas—In 
general, there was a positive relationship 
among all attitude areas. Favorable attitudes 
toward one aspect of work were usually asso- 
ciated with favorable attitudes toward other 
aspects. In addition, favorable general atti- 
tudes toward work were most associated with 
favorable attitudes toward the company, 
chance for advancement, supervision and 
working conditions while attitudes toward 
such things as benefits and fellow employees 
were less important in causing favorable or 
unfavorable general work attitudes. 

Quite important, too, was the fact that atti- 
tudes toward supervision were directly related 
to production on the job. As other studies 
have indicated, too, money, whether in terms 
of benefits or actual compensation was not 
particularly related to general job attitudes 
or to attitudes toward other aspects of work. 


DISCUSSION 


This study suggests that for this group of 
salesmen, job attitudes and job performance 
were related. Why did this occur? There are 
many possible reasons, but first of all, it ap- 
pears that the criterion of performance is a 
fairly weil defined one and, in addition, the 
measurement of general job satisfaction is 
based on a valid and well-defined scale. Sec- 
ond, a personal feeling is that these results 
should occur in the sales area because the 
salesman, almost more than any other kind 
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of employee, gets direct feedback on his own 
performance which can reinforce his self- 
confidence and create favorable attitudes. 

Attitudes toward supervision are probably 
marked because these salesmen operate as 
highly independent agents, covering wide 
ranges of territory and their only contact usu- 
ally with the company is through their man- 
ager. This manager, then, is highly important 
in communications for he represents the com- 
pany to the salesman. 

Finally, the study seems to imply that such 
areas as compensation and benefits which tra- 
ditionally produce many gripes do not seem 
to be too important in shaping overall work 
attitudes or in terms of actual job perform- 
ance. This ties in with some of Herzberg, 
Mausner, and Snyderman’s (1958) writings 
for these things would be listed as dissatis- 
fiers but not as strong motivators to better 
performance. 

Overall, then, this is not a definitive study 
but it does show that there can be a definite 
relationship between general attitudes toward 
work and actual performance on a sales job. 
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TRAINING TIME AND PROGRAMED INSTRUCTION 


GEORGE DOUGLAS MAYO ann ALEXANDER A. LONGO 


Naval Air Technical Training Command 1 


The hypothesis was tested that training time can be reduced by means of 
programed instruction, without loss in training quality. 226 US Navy and 
Marine Corps trainees in electronics fundamentals served as S’s. A matched 
group design was used in which a 31% time saving on the part of the pro- 
gramed instruction group was an integral part of the experiment. On the 
2 measures of learning, which followed the instruction, the programed in- 
struction group scored significantly higher (p < .01 on one, while no sig- 
nificant difference was found on the other. The hypothesis was considered to 


be sustained. 


One of the generalizations that is beginning 
to emerge from research pertaining to pro- 
gramed instruction is that learning can be 
accomplished by means of programed in- 
struction in a substantially shorter period of 
time than by conventional instruction (Ferster 
& Sapon, 1958; Goldberg, Dawson, & Barrett, 
1964; Hough, 1962; Hughes & McNamara, 
1961). This is of major interest to industry 
and to the military because of the relationship 
between the time a trainee spends in training 
and the cost of that training. In most in- 
stances, however, the “time” aspect has been 
an incidental finding in studies that were pri- 
marily concerned with other questions. More- 
over, the time factor usually is stated in terms 
of the mean time required to complete the 
program, which is a statistic of questionable 
value to the training administrator, who often 
will want to know how much time will be re- 
quired for a unit of programed material when 
it is to be followed by a shop or laboratory 
session or by conventional instruction. 

In the present study attention was focused 
upon a saving in time as the matter of pri- 
mary interest. In essence, the incidental find- 
ing of the studies mentioned above was enter- 
tained as the hypothesis in the present study. 

1The views expressed are those of the authors 


and are not necessarily held by the Department of 
the Navy or any of its agencies. 


Specifically, it was hypothesized that equal or 
greater learning could be achieved by pro- 
gramed instruction in a specified, shorter 
period of time than by conventional instruc- 
tion. Time reduction was included as a part 
of the design of the experiment. 


MetHop 


The study utilized a matched group design, the 
matching variable being a pretest which previous 
research had indicated correlated well with per- 
formance in the electronics fundamentals course 
in which the study was conducted. In this par- 
ticular instance it turned out that the pretest cor- 
related between .45 and .56 with the measures 
taken at the end of the experiment. The content of 
the test pertained to Ohm’s Law and powers of ten. 
The sample consisted of 226 Navy and Marine 
Corps recruits assigned to training in aviation 
electronics fundamentals. The programed material 
was on the subject of elements of electrical physics 
and constituted all of the material which was con- 
sidered appropriate for programing in the first 
week or 40 hours of instruction in the school. Of 
this 40 hours, 13 hours of material normally pre- 
sented by conventional instruction was programed. 

The division of the students into an experimental 
and control group was accomplished as follows. The 
students were arranged in order on the basis of 
their scores on the pretest. Odd numbered students 
were then assigned to one group and even numbered 
students to the other. The means and standard 
deviations of the two groups were then compared 
and minor shifting of students accomplished to 
make the means of the two groups identical and 
the standard deviations as nearly so as_ possible. 
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Assignment of the two groups thus composed to 
the experimental and control conditions was made by 
the toss of a coin. During the week of instruction 
a small number, approximately five in each class, 
missed part of the instruction because of illness, 
legal problems, or for other administrative reasons. 
In order to ensure that the groups receiving pro- 
gramed instruction and conventional instruction 
were still matched on the basis of the pretest after 
losing these personnel from the two groups, the 
means and standard deviation of the two groups 
were computed, and minor adjustments were made 
by eliminating from the sample two or three students 
whose elimination would make the means of the 
two groups identical on the basis of the pretest. 
Having accomplished this, the programed _in- 
struction group and the conventional group each 
contained 113 students with the mean of each group 
14.64 on the pretest. This value represented the mean 
number of items answered correctly of the total of 
27 items on the pretest. The standard deviations 
were not identical but quite comparable at 5.81 for 
the conventional group and 5.73 for the programed 
instruction group. 


Instructional Program 


Programing of the instructional material was 
accomplished by a team consisting of one civilian 
educational specialist and four Navy chief petty 
officers. All members of the team were competent 
both in the technical area and in the theory and 
techniques of instructional programing. The pro- 
gramed instruction package consisted of five book- 
lets, with titles as follows: “Matter”; “Statics”; 
“Dynamic Electricity”; “Ohm’s Law”; and “Con- 
ductors, Resistors, and Insulators.” Each booklet 
was accompanied by a list of specific behavioral 
objectives, and constructed response-type test items 
designed to measure the objectives. Following pro- 
gramed instruction terminology, these were called 
“criterion” tests. Prior to the experiment each of 
the booklets was administered and revised until 
90% of the students completing the program 
achieved 90% of the specific behavioral objectives 
the booklet was designed to accomplish. 

The time limit on each booklet was set at a 
point which experience with the program indicated 
would permit 95% of the students to complete the 
booklet in the time allotted. As previously stated, 
this turned out to be a total of 9 hours for the five 
booklets. Since conventional instruction of the 
same material required 13 hours, this represented a 
time reduction of 31%. Any students who could not 
complete a booklet in the time allotted on a par- 
ticular day were required to complete it on their 
time before the beginning of the next day. In most 
instances the students who were not through at the 
time the time limit expired were almost through 
and had only a small amount of material to com- 
plete on their own time. Students completing the 
program before the end of the time allotted were 
permitted to work on homework assignments which 
otherwise they would have been required to com- 


plete on their own time, after the close of the school 
day. The programed material was administered by 
members of the programed instruction team with 
the assistance of the regular instructors in the 
classes. In programing the material both linear and 
branching procedures were used, depending upon 
which seemed to be best suited to the efficient teach- 
ing of the material. 


Measures 


One of the problems encountered when courses 
are revised and the revision compared with the 
former course, is the identification or construction 
of a measure which is an adequate measure of 
performance in both courses. If the conventional 
measure is used, it may penalize the revised, and 
perhaps improved, version of the course. Similarly, 
if a test designed to measure learning from the re- 
vised course is used, it more or less assumes that 
the revisions made were correct, which often is the 
point the experiment seeks to test. When major 
changes are made in a course it becomes difficult, 
or even impossible, to compare the two courses 
experimentally because of the measurement problem. 

In the present study, however, the objectives and — 
the content of the conventional instruction and 
programed instruction were very similar, and 
hence the measurement problem was not as serious 
as it might otherwise have been. It was nevertheless 
considered desirable to use both a conventional 
test and the test that was constructed in the early 
stages of the programing procedure to measure each 
of the specific behavioral objectives. Both tests were 
administered to both the programed instruction 
group and conventional group on the last school 
day of the week. This was the day after the com- 
pletion of the programed material. 

The conventional test consisted of 50 multiple- 
choice items, previously written and item analyzed 
by the electronics fundamentals school. These 50 
items were essentially all of the items used by the 
school in various tests to measure learning or 
achievement of the material which subsequently was 
programed. The tests constructed by the pro- 
gramed instruction team io measure the specific 
behavioral objectives of the five programed book- 
lets were combined into one test. Following pro- 
gramed instruction terminology, this test is 
identified in the present study as the “criterion” 
test. It consisted of 86 items, mostly of the con- 
structed response, as opposed to multiple choice, 
type. Neither the students in the experimental group 
nor the control group had seen either of the tests 
before. 


RESULTS 


As noted in a preceding section the primary 
comparisons were between 113 students who 
received conventional instruction in the first 
week of the electronics school and 113 stu- 
dents who received programed instruction, 
the two groups being matched on the basis of 
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pretest which in the conventional group 
orrelated .56 with the conventional test and 
16 with the “criterion” test. The intercorrela- 
ons among the pretest, conventional test, and 
criterion” test are shown in Table 1. 

Table 2 shows the comparison of the group 
aat received conventional instruction with 
ne group that received programed instruc- 
ion on the basis of the conventional test and 
n the basis of the “criterion” test. Scores on 
ll measures were simply the number of items 
nswered correctly. As shown in Table 2 the 
ean number of items answered correctly, out 
f 50, by the conventional group on the con- 
entional test was 41.06 as compared with 
0.08 answered correctly by the programed 
istruction group. This difference of slightly 
2ss than 1 point does not reach the usual level 
equired for statistical significance (p < .05). 
Jeither is the difference between the vari- 
nces of the two groups on the conventional 
est statistically significant. 

On the “criterion” test the conventional 
roup mean was 72.44 items correct out of 86, 
ompared with 78.23 for the programed in- 
truction group. Comparison of the variances 
f the conventional group and the pro- 
ramed instruction group showed the differ- 
nce to be significant at the .01 level. As indi- 
ated by Snedecor (1946) the test for deter- 
aining the significance of the difference be- 
ween means in effect assumes that the differ- 
nce between the variances is nonsignificant. 
f the difference between variances is signifi- 
ant, as is true in the present instance, Snede- 
or recommends that instead of the usual de- 
rees of freedom, 2(m — 1), wherein 1 is the 


TABLE 1 


NTERCORRELATIONS BETWEEN PRETEST, CONVEN- 
TIONAL TEST, AND PROGRAMED INSTRUCTION 
“CRITERION” TEST WITHIN THE CONVENTIONAL 








AND PROGRAMED INSTRUCTION GROUPS 
Programed 

Conventional instruction 

group group 
Conven- Crite- Conven- Crite- 

tional rion tional rion 

test test test test 

'retest 56 46 50 45 
‘onventional test 65 -62 





Note.—N = 113 in each group. 


TABLE 2 


MEANS AND STANDARD DEVIATIONS WITHIN THE 
CONVENTIONAL AND PROGRAMED INSTRUCTION 











GROUPS 
Programed 
Conventional instruction 
group group 

M SD M SD 

Pretest 14.64 5.81 14.64 Shs 
Conventional test 41.06 5.52 40.08 4.84 
“Criterion’’ test 72.44 6.69* 1S:255) tL 





Note.—N = 113 in each group. 
*p <.01. 


number of pairs in the sample, that the de- 
grees of freedom be computed as m— 1. In 
the present instance with 113 pairs, this re- 
sulted in degrees of freedom equal to 112. 
Using this value and the ¢ test involving the 
correlation term which is appropriate in the 
case of matched groups, the difference be- 
tween the conventional group and the pro- 
gramed instruction group on the “criterion” 
test was significant well beyond the .01 level. 


DISCUSSION 


As noted at the outset the primary hypothe- 
sis tested in the present study was that stu- 
dents could learn from properly constructed 
programed material about as well as in the 
conventional classroom situation and in a sub- 
stantially shorter length of time, in this in- 
stance in slightly less than 70% of the time 
required for conventional instruction. In view 
of the results just stated, namely no signifi- 
cant difference between the mean scores made 
on the conventional test and a difference sig- 
nificant at the .01 level favoring the pro- 
gramed instruction group on the “crite- 
rion” test, it appears that the hypothesis is 
sustained. 

With respect to the generality of the find- 
ings it should be kept in mind that the study 
utilized one program, produced by one pro- 
gramed instruction team, in one particular 
type of subject matter. It is reasonable to 
assume that some programing teams pro- 
duce better or more efficient instructional ma- 
terials than do others. The material used in 
the study was probably one of the better pro- 
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grams available. The same results might not 
be achieved with less adequate programed 
material. Similarly, certain types of informa- 
tion are doubtless better suited to program- 
ing than are others. The material used in 
the present study was well suited to pro- 
graming procedures. It would not be appro- 
priate to generalize the findings to materials 
poorly suited to programing techniques. It 
is also possible that differences between a mili- 
tary training situation and other training and 
educational situations might affect the gen- 
erality of the findings to some extent. How- 
ever, in situations which are basically the 
same as those that existed in the present study 
the results are interpreted to indicate that 
programed instruction has considerable po- 
tential for the reduction of training cost 
through the reduction of training time. 
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SVIB AS A PREDICTOR OF JOB SATISFACTION 


VERA M. SCHLETZER ! 


Women’s Continuing Education Program, University of Minnesota 


A study was made of 185 graduates of certain professional curricula at the 
University of Minnesota to test the hypothesis that job satisfaction in a 
certain occupation is related to congruent or appropriate interests in that 
occupation, Occupations represented were medicine, law, dentistry, mechanical 
engineering, accounting, and journalism. The Ss were contacted by mail 
and asked to fill out 3 job satisfaction blanks and the SVIB. 12th grade SVIB 
scores were also available for each S. Only 1 of 56 relationships between 
interests and job satisfaction scores was significantly different from 0. The 
lack of relationships was true for both earlier and current testing of interests 


and for all 3 job satisfaction blanks. 


Counseling for occupational choice involves, 
mong other things, some notion of the amount 
f satisfaction likely to be attained in the 
areer being considered. As Darley and 
[agenah (1955) point out, “A man’s working 
fe spans forty to fifty years. In the main, he 
eeps his nose to the same kind of grindstone 
or that period of time. Thus it is important 
) consider what makes grindstones attractive 
_what satisfactions can be found in jobs.” In 
is early book, E. K. Strong, Jr. (1943) says 
hat, “Interests are indicators of what activi- 
ies bring satisfaction.” In another part of the 
ame book, he states, “The criterion of a 
ocational-interest test should be whether or 
ot the person will be satisfied in the career 
9 which it directs him, other factors than 
nterest being disregarded.” 

In his follow-up study, Strong (1955) strug- 
les with the problem of finding an adequate 
aeasure of job satisfaction. Although noting 
ts inadequacy, he used the criterion of “oc- 
upation engaged in” for his own follow-up 
tudies. However, the expectation of con- 
inuance in an occupation is of little sig- 
\ificance to the client making a vocational 
hoice that involves long and expensive prep- 
ration for the career. He is usually certain 
hat, if he invests time, money, and energy in 
iighly specialized collegiate and professional 


1 Assistant Professor and Coordinator of the 
Vomen’s Continuing Education Program, University 
f Minnesota. This study was carried out in partial 
ulfillment of the requirements for the Doctor of 
*hilosophy degree at the University of Minnesota, 
lloyd H. Lofquist, major advisor. 


training, it will be difficult for him to leave 
the vocation for which he prepared. 

The hundreds of studies existing in the 
area of job satisfaction attest to the import- 
ance of work. Roe (1956) states, “In our 
society there is no single situation which is 
potentially so capable of giving some satis- 
faction at all levels of basic needs as is the 
occupation.” A survey of research projects 
concerning job satisfaction (Brayfield, Wells, 
& Strate, 1957; England & Stein, 1961; Ghei, 
1960; Hoppock, 1935; Super, 1939; Vollmar 
& Kinney, 1955) indicates that age, sex, 
education, occupation, and occupational level 
of workers contribute in varied and complex 
ways to the attitudes they have about their 
jobs. In studying the relationship between 
interests and job satisfaction, the results will 
be more meaningful if these correlates are 
not permitted to confound the findings. 

Since career choices are very often made in 
the twelfth grade, the predictive effectiveness 
of an SVIB, taken at this time, is especially 
important. However, in order to study the 
relationship between interests and job satis- 
faction, it seems necessary to control the 
effects of other variables such as age, sex, 
and occupational level by holding them con- 
stant. A follow-up study of male groups with 
specific collegiate and professional degrees 
obtained within a 3-year period, each sub- 
ject (S) of which had taken the SVIB as a 
high school senior, seemed to meet these 
criteria. The central question investigated was 
whether men who have measured interests 
congruent with the interests of their profes- 
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sions express more job satisfaction in these 
professions than do men whose measured 
interests are not congruent with their pro- 
fessions. 

The null hypothesis to be tested is: Men 
in certain selected occupations who had 
received As and B+s on the appropriate 
occupational scale of the SVIB do not differ 
in their responses to job satisfaction in- 
ventories from men in these occupations who 
had received Bs or lower on the appropriate 
scales of the SVIB. 


METHOD 


Subjects. The Ss are 185 men who had graduated 
from various professional schools or curricula of the 
University of Minnesota in 1957, 1958, or 1959, and 
who had taken the SVIB when they were in Grade 
12. The curricula represented were medicine, law, 
dentistry, mechanical engineering, accounting, and 
journalism. Their mean age at the time of the 
study in 1962 ranged from 26.26 for the engineers 
to 29.15 for the lawyers. 

Procedure. The Ss were contacted by mail and 
asked to participate in a study of job satisfaction 
of persons in certain professional kinds of jobs. 
Three job satisfaction inventories, a personal data 
sheet, and the SVIB were included with the letter. 
The men were promised an interpretation of the 
SVIB as well as a summary of the results of the 
project in return for their cooperation. The Ss of 
this study represent a 78% response of the total 
group available for study. The nonrespondents and 
respondents did not differ from each other on high 
school ranks, percentile scores on the American 
Council on Education Psychological Test scores, or 
percentile scores on the Cooperative English Test. 
Respondents and nonrespondents in four occupa- 
tional groups did not differ from each other in age. 
Physician and lawyer nonrespondents differed sig- 
nificantly but slightly from respondent groups on 
the age variable. However, since no previous re- 
search indicates that a 1-year age difference has any 


would seem these differences would not vitiate the 
data. 

Measures. The instruments used to measure job 
satisfaction were the Hoppock Job Satisfaction Blank, 
(Hoppock, 1935) the Brayfield-Rothe Job Satis- 
faction Blank, (Brayfield & Rothe, 1951) along with 
a newly developed Job Dimension’s Inventory. The 
Total Satisfaction score is a sum of the three scores 
after they have been converted to standard scores 
with mean = 50 and standard deviation =10 by the 


formula, Zi=50+ 10 a* 


1957). 

Job satisfaction scores for the different blanks are 
dichotomized with those scores at the median and 
above within each occupational group being classified 
as high, those below the median within an occupa- 
tional group being classed as low. 

For purposes of this study, congruency of in- 
terests is defined as an A or B+ on the appropriate 
occupational scale of the SVIB, and noncongruency 
is defined as B and lower on the appropriate scale. 
For accountants, high scores on either CPA or 
Accountant scales classify the S as congruent. For 
Journalists, high scores on either Advertising Man or 
Author-Journalist classify the subject as congruent. 
The Ss who were in inappropriate occupations (i.e., 
aviation) or in graduate school as well as those who 
did not complete all of the instruments, were not 
included in the analysis: Two accounting graduates 
were found to have completed law school and are 
classified as lawyers in the correlational analyses. 

The product moment correlations shown in the 
results were obtained by the formula 


a3 ad — be 
Vatbc+d(a+to(b+d) 


The significance of the ¢ coefficient was tested by 
referring N ¢? to a chi-square table with one degree 
of freedom (Walker & Lev, 1953). 


(Dixon & Massey, 








RESULTS 


Table 1 shows average scores obtained by 
the different occupational groups on the three 











appreciable effect on reported job satisfaction, it j ob satisfaction measures. The Hoppock 
TABLE 1 
OCCUPATIONAL COMPARISONS OF JoB SATISFACTION MEASURES 
Hoppock Brayfield Dimensions 
Occupational SS ia —— ae 
group NV M SD Maes SD M SD 
Accountants 24 217 2.42 71.96 9.21 157.04 30.75 
Dentists 35 23.60 DP 74.09 7.10 170.20 20.02 
Engineers 37 20.16 2.50 66.46 9.17 147.81 31.64 
Journalists 20 19.80 3.40 68.00 11.36 150.50 27.58 
Lawyers 36 22.67 2.54 74.06 11.30 176.81 24.22 
Physicians 28 23.11 2.85 75.32 8.78 170.25 29.26 
Total 180 21.96 3.03 dle 10.04 162.08 29.47 
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TABLE 2 


CoRRELATIONS BETWEEN HicH ScHooL INTERESTS AND SCORES ON JOB SATISFACTION BLANKS 








Phi coefficients 








Occupational 

group N Hoppock 
Accountants oe —.0278 
Dentists 3B — .0234 
Engineers af — .0067 
Journalists 19 1494 
Lawyers 38 .0532 
Physicians 28 — .2887 

Total 178 — .1373 





Brayfield Dimensions tl Sat. 
— .0550 — .0550 — .0550 
— .1993 — 3137 — 3137 

0405 1382 1382 
1556 3667 0556 
.0000 —.1685 — 1685 
.1429 .0000 —.1429 
.0086 — .0347 — .0902 





planks correlated .83 with the Brayfield scores 
and .75 with the Dimensions scores. The 
Brayfield and JDimensions scores _ corre- 
lated .67. 

Table 2 shows the correlations between 
congruency of interests on the SVIB taken in 
12th grade and present job satisfaction as 
measured by the different blanks. None of 
the coefficients shown is significantly different 
from zero. 

Table 3 shows correlations between con- 
sruency of interests on current SVIBs and 
job satisfaction as measured by the different 
blanks. Only one of the phi coefficients is 
significantly different from zero at the .05 level. 


DISCUSSION 


On the basis of the correlations obtained 
in Tables 2 and 3, we cannot reject the null 
hypothesis of no differences in responses to 
job satisfaction inventories for congruent and 
noncongruent occupational groups. The same 
situation obtains for SVIBs taken 6 to 13 


years earlier as twelfth graders and for SVIBs 
taken concurrently with the job satisfaction 
blanks. 

The correlations obtained do not indicate 
any trends that might be significant with 
larger samples. For instance, congruent in- 
terests are not more highly related to a global 
measure of job satisfaction (i.e., the Hoppock 
and the Brayfield) than to one directed to 
more specific aspects of the job (i.e., Dimen- 
sions). Looking at the various occupations 
does not reveal any difference in the relation- 
ship of congruent interest to job satisfaction. 
For instance, physicians, dentists, and lawyers 
are more likely to be self-employed and thus 
to have a little more freedom in structuring 
their jobs to their own wishes. However, while 
men in these professions report more satisfac- 
tion, job satisfaction is not more closely re- 
lated to the congruency of their interests. 
Current SVIBs are not more highly related 
to measures of job satisfaction than are the 
earlier interest tests. 


TABLE 3 


CORRELATIONS BETWEEN CURRENT INTERESTS AND SCORES ON JOB SATISFACTION BLANKS 





Phi coefficients 








Occupational 
group N Hoppock 
Accountants 21 3443 
Dentists 34 .0601 
Engineers 37 .0067 
Journalists 19 .1409 
Lawyers 35 —.1495 
Physicians 28 — .0589 
Total 174 0404 





* p< .05. 


Brayfield Dimensions Ttl. Sat. 
oT —.1168 1557 
1886 — .0629 — .0629 
0495 Pode 1382 
2858 0272 0272 
0822 —.1462 — .0486 
1968 1968 1968 
1411 .0589 0534: 
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Seeking a plausible explanation for the lack 
of significant correlations between job satis- 
faction and measured interests leads one to 
examine the instruments used to measure the 
variables. Insofar as job satisfaction is con- 
cerned, two previously validated instruments 
were used along with a newly developed one. 
While one can argue that these job satisfac- 
tion blanks were not sensitive enough to 
register subtle differences between those with 
congruent interests and those with non- 
congruent interests within an occupation, 
they did register significant interoccupational 
differences in job satisfaction. However, the 
men in the study reported generally high job 
satisfaction and this may account for the lack 
of correlation between interests and satisfac- 
tion. 

Since this study has been done only on men 
who have survived curricula and employment 
try-outs, it is perhaps not surprising that 
SVIBs taken in high school do not distinguish 
between the more and less satisfied members 
of these professions. While current SVIBs are 
not positively related to job satisfaction in- 
ventory scores, it is meaningful that the 
greater majority of the men reporting had 
congruent interest patterns. The present study 
supports Strong’s contention (1951) that con- 
tinuous employment in an occupation causes 
a slight increase in score and that most of this 
results during the first 5 years after leaving 
college. 
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CONSTRUCTION AND VALIDATION OF A THURSTONE 


SCALE OF LIBERALISM—CONSERVATISM 


JOHN H. WRIGHT anp JACK M. HICKS 
Wake Forest College 


A 23-item scale of liberalism—conservatism was constructed by the Thurstone 
method of equal-appearing intervals and found to correlate highly (point- 
biserial r= .64) with a naturalistic behavioral criterion consisting of self- 
selected, actively campaigning political groups (Young Democrats and Young 
Republicans). The scale yielded an internal-consistency coefficient of 79, in- 
dicating substantial common variance among items, and a coefficient of re- 


producibility (Rep) of .87, indicating quasiscalability. A considerably greater 
proportion of nonperfect scale types, as evidenced by a significantly greater 
number of errors of reproducibility (p <.001), was found among the Young 
Republicans than among the Young Democrats. 


The purpose of the investigation to be re- 
ported was to develop a scale of liberalism— 
conservatism and to demonstrate its rela- 
tionship with an unusually unitary behavioral 
criterion. The scale was constructed by the 
equal-appearing intervals method introduced 
by Thurstone and Chave (1929). The major 
technique of validating the scale was provided 
by the availability of political clubs organized 
solely for, and actively engaged in, the sup- 
port of political candidates in the 1964 presi- 
dential campaign. The particular groups used 
were the Young Democrats and Young Repub- 
licans at a southern liberal arts college. It is 
suggested that these groups provided a purer 
validation criterion than would normally be 
expected of a naturalistic situation because of 
the unique character of the 1964 presidential 
campaign. This was a campaign in which 
platforms were drawn along liberalism—con- 
servatism lines with unprecedented cogency. 
It may even be asserted that this was the 
major issue of the campaign. If so, it seems 
reasonable to presume that liberalism—con- 
servatism orientation might have been a pri- 
mary determinant of one’s choice between the 
two major political campaign groups. No 
previously published liberalism—conservatism 
scale (e.g., Adorno, Frenkel-Brunswik, Levin- 
son, & Sanford, 1950; Centers, 1949; Mc- 
Closky, 1958; Stotsky & Lachman, 1956) 
can be found which used such sharply de- 
fined criterion groups. A typical procedure 
for validating scales of liberalism—conserva- 
tism has been to demonstrate correlations with 


whether a constituent had voted Democratic 
or Republican in a presidential election. How- 
ever, it would seem that the relatively easy 
act of expressing a preference for a particular 
candidate should serve as a lesser indicator of 
one’s political orientation than having worked 
for several weeks in his behalf. There seems a 
strong case, therefore, for expecting that the 
1964 presidential campaign was unique in 
providing naturalistic validation criteria with 
unprecedentedly high factorial overlap with 
a scale of liberalism—conservatism. 


MertTHOD 
Subjects 


The attitude statements were constructed and 
scaled by 45 male and female members of an experi- 
mental psychology class at Wake Forest College. 
The reference-validation groups consisted of male 
and female members of the Young Democrat and 
Young Republican organizations at Wake Forest 
College. Properly completed questionnaires were re- 
turned by 35 Republicans and 80 Democrats, none 
of whom were aware of the nature of the experiment. 


Procedure 


Attitude scale construction. The 45 Ss were in- 
structed to write statements which reflected political 
attitudes ranging from ultraliberal to ultraconserva- 
tive viewpoints. The Ss were also instructed to con- 
struct statements representing the entire range of 
political attitude under consideration and not to 
restrict their statements to the area on the attitude 
continuum representing their individual _ political 
opinions. From the more than 500 statements col- 
lected, 358 were selected to be scaled. The criterion 
for inclusion among the statements to be scaled was 
that a statement should not contain reference to 
liberal-conservative issues unique to the 1964 presi- 
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TABLE 1 


SCALE VALUES AND STANDARD DEVIATIONS OF 
THE TWENTY-THREE STATEMENTS OF THE 
LIBERALISM—CONSERVATISM SCALE 








Scale 


Attitude statements values SD 





1, All old people should be taken care 


of by the government. 2.30 0.88 
2. The government should finance 

college education. 2.64 1.50 
3. Government sponsored medical care 

for the aged is definitely desirable. 2.91 0.92 


4, Efficient large-scale production ne- 
cessitates government interven- 
tion. Seid i3t2 


5. It is the conern of the federal govern- 
ment to initiate, direct, and fi- 
nance relief programs for poverty 
stricken areas, 3.26 0.94 


6. The government should provide and 
create jobs to relieve the unem- 


ployment situation. 3.55 1.07 
7. I favor increased federal aid to 
higher education. 3.84 1.02 
8. T.V.A. isa very effective and bene- 
ficial program. 4.14 1.33 
9. The NDEA isa good policy for edu- 
cational improvement. 4.44 0.95 
10. Labor unions play an essential role 
in American democracy. 4.84 te5Z 
11. I believe in a tax increase when justi- 
fied. 5.36 1.19 
12. I support legislative bills for a tax 
cut. 6.07 2.00 


13. The U.S. is running a close second 
to Russia in technological achieve- 
ments. 6.30 1.60 


14, Lam against parts of the Medicare 
Bill. 


6.65 1.01 

15. The national budget should be bal- 

anced. 6.97 1.15 
16. The federal government should at- 

tempt to cut its annual spending. 7.45 1.02 
17. I believe in less federal tax and more 

state tax. 7.98 0.99 
18. We should cut foreign aid in order 

to reduce our national debt. 8.05 0.90 
19. I favor a de-centralization of the 

federal government. 8.75 0.98 
20. The U.S. should withdraw from the 

U.N. because we bear the financial 

burden. 8.95 1.08 
21. Foreign aid spending should be abol- 

ished. 9.70 1. 
22. Social security should be abolished. 10.07 0.82 
23. Isolation (complete) is the answer 

to our foreign policy. 10.50 0.98 





dential campaign. In addition, all statements re- 
ferring directly to the presidential candidates were 
discarded. 

In three separate scaling sessions the 358 state- 
ments were presented aurally to the 45 Ss who were 
instructed to place each statement on an 11-point 
scale ranging from ultraliberal (scale value of one) 
to ultraconservative (scale value of 11) in such a 
way so as to yield equal intervals between successive 
scale values. In addition to these two anchor points 
Ss were instructed to place at 6.00, the midpoint of 
the hypothetical continuum, statements reflecting a 
political attitude which was neither liberal nor con- 
servative. Scale values 3.00 and 9.00 were described 
to Ss as moderately liberal and conservative, re- 


spectively. Throughout the scaling sessions Ss were 
repeatedly reminded to ignore completely their 
own political attitudes when judging a statement. 
The mean of the 45 judgments for each statement 
was taken as the scale value for that statement. 
Hicks and Campbell (in press) have reported cor- 
relations of the order of .98 between the means of 
such judgments and the scale values derived from 
the unit-normal deviate transformations usually 
employed with categorical judgments. The 23 state- 
ments selected for the liberalism—conservatism at- 
titude scale are presented in Table 1 along with 
their respective scale values and standard deviations. 
These statements were selected so as to represent 
the entire attitude continuum. In cases where two or 
more statements were found to possess equal scale 
values, the statement with the smallest standard 
deviation was selected to represent that point on the 
attitude continuum. 

Attitude scale validation. In order to determine if 
these statements represented a valid scale of political 
attitude ranging from ultraliberal to ultraconserva- 
tive views, a questionnaire was constructed containing 
the 23 statements listed in random order and headed 
by the following instructions: “We are interested in 
finding out how you feel about certain issues. Place 
a plus (+) beside each of the following statements 
with which you agree and a minus (—) beside the 
statements with which you disagree. Be sure to mark 
all of the statements.” The questionnaire was ad- 
ministered to all available members of the student 
Democratic and Republican organizations at Wake 
Forest College. Because of the nature of the 1964 
presidential campaign, issues, and candidates, the 
assumption that actively campaigning Republicans 
and Democrats represented conservative and liberal 
viewpoints, respectively, was considered unquestion- 
able. 


RESULTS AND DISCUSSION 
Attitude Scale Construction 


Inspection of Table 1 reveals substantial 
agreement among the 45 judges as to the 
location of the 23 statements on a hypothetical 
continuum of political attitude ranging from 
ultraliberal to ultraconservative views. The 
largest standard deviation (2.00) was obtained 
for the statement (scale value 6.07) represent- 
ing the midpoint of the attitude continuum, 
a finding commonly reported in the scaling 
literature. Following the scaling sessions many 
Ss reported it very difficult to judge a state- 
ment as reflecting neither liberal nor conserva- 
tive attitudes, although judging a statement as 
only slightly liberal or conservative was not 
considered difficult. 

The Spearman-Brown reliability coefficient 
obtained from the responses of the 115 Ss in 
the reference-validation groups to the 23-item 
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questionnaire was found to be .79, a value 
somewhat higher than the internal consistency 
coefficients reported by Adorno et al. (1950) 
for their often cited politico-economic con- 
servatism scales. In an effort to determine the 
degree of unidimensionality of the attitude 
scale, a Guttman reproducibility coefficient 
(Rep) was computed. Since procedures for 
determining Rep for a scale composed of non- 
monotone, point items differ somewhat from 
those employed with monotone items (Torger- 
son, 1958; Wohlwill, 1963), a brief descrip- 
tion of the procedure employed in the present 
study is in order. The initial step was to 
construct a matrix ordering Ss along one 
dimension in terms of the mean of the scale 
items with which each S indicated agreement 
and items along the second dimension in terms 
of their respective scale values. After estab- 
lishing by inspection the scale boundaries of 
the largest cluster of consecutive statements 
endorsed by a particular S, the number of 
errors of reproducibility for that S was com- 
puted by summing the number of endorsed 
statements falling outside these boundaries 
with the number of unendorsed statements 
falling within the boundaries. This procedure 
was repeated for each S and Rep computed in 
the usual fashion. Rep in the present study was 
found to be .87, a value greater than the 
maximum Rep of .74 which would be expected 
on the basis of the item popularities found in 
the present study and the assumption of zero 
scalability among 23 items. 


Attitude Scale Validation 


As a primary index of the validity of the 
attitude scale, a point-biserial coefficient of 
correlation was computed between the political 
affiliations of the 35 Young Republicans and 
80 Young Democrats and the mean of the 
scale items with which each of these Ss in- 
dicated agreement. This correlation was found 
to be .64, indicating a very high validity 
coefficient between liberal and conservative at- 
titudes as measured by the attitude scale and 
affiliation with groups publicly endorsing a 
clearly liberal or conservative political candi- 
date. The mean score of the Young Demo- 
crats was 4.81. The range of these scores 
was from 3.87 to 6.39; 77 of the 80 Young 
Democrats placed themselves at a point on 


the scale reflecting liberal political attitudes. 
The mean of the Young Republicans’ scores, 
which ranged from 4.10 to 7.07, was 5.93. 
The amount of overlap between the two 
groups was very small. Only four Young 
Republicans were more liberal than the mean 
(4.81) Young Democrat and only three were 
more liberal than the median (4.77) Young 
Democrat. Similarly, only three Young Demo- 
crats were more conservative than the mean 
(5.93) and median (6.03) Young Repub- 
licans. It is interesting to note that the mean 
scale value of the Young Republicans was 
located near the midpoint of the hypothetical 
attitude continuum rather than at a point well 
within the conservative range, whereas the 
mean scale value of the Young Democrats was 
located at a point well within the liberal 
range of the attitude continuum. In addition, 
an examination of the pattern of items en- 
dorsed by the members of the two groups 
revealed a considerably greater proportion of 
nonperfect scale types, as evidenced by a 
significantly greater number of errors of 
reproducibility (F = 18.54, df = 1/113, p< 
001), among the Young Republicans than 
among the Young Democrats. However, de- 
spite the failure of the Young Republicans to 
locate themselves at a point well within the 
conservative portion of the attitude con- 
tinuum, it is clearly evident that the attitude 
scale successfully differentiated Young Repub- 
licans from Young Democrats. 

In conclusion, this paper has reported the 
development of a scale inferred to measure 
the political-orientation construct of liberalism 
—conservatism. Three lines of evidence were 
offered in support of this inference. First, 
only those statements achieving a high de- 
gree of interjudge agreement during scaling 
were selected for the attitude scale, a criterion 
accepted as assuring their relevance to the 
attribute of interest. The second and third 
lines of evidence conformed to construct- 
validation procedures recommended by Cron- 
bach and Meehl (1955) and Cureton (1965). 
An index of internal consistency (Spearman- 
Brown reliability) was computed and found to 
be quite acceptable. To provide construct 
validation the scale was correlated with an 
unusually pure naturalistic behavioral cri- 
terion. The high validity coefficient between 
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the scale and the criterion suggested the 
successful construction of a political-orienta- 
tion scale highly related to a genuine social 
phenomenon logically defensible as a valid 
criterion of liberalism—conservatism. 
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VALIDATION AND REVISION OF A TEST IN USE 


DONALD A. PETERSON anp S. RAINS WALLACE 


Life Insurance Agency Management Association, Hartford, Connecticut 


The problems of evaluating a test when it is being used are discussed. Data 
are presented for a test when “in use” and in a “give but don’t use” condition. 
Emphasis is placed upon the effects of indirect curtailment when a test is 
being correctly used with other valid selection instruments. This phenomenon 
presents problems both for evaluating a test and for the appropriate weighting 
of tests in a battery. This may be a major problem in applied selection re- 


search programs. 


Attention to the obvious defects inherent 
in concurrent validation and the advantages 
of the predictive approach may lead us to 
neglect some of the problems encountered in 
the latter when we attempt to estimate the 
validity of an instrument in use or to improve 
its validity by adding or substituting new 
items or tests. One aspect of the problem was 
recognized in the Air Force program of World 
War II and excellently treated by Thorndike 
(1949). This is the restriction of range result- 
ing from the imposition of a cutoff score. He 
also mentions the problem of some types of 
indirect curtailment. 

Unfortunately, the industrial situation in- 
frequently provides the clear-cut range re- 
striction which may occur in the military 
milieu. In the Air Force program, all men 
who passed the cutoff score were placed into 
training and none who failed (with a very 
few unmentionable exceptions) were admitted. 
In the industrial situation, the restriction at 
the low end of the score distribution is con- 
founded by a considerable number of “ex- 
ceptions,” and a restriction may occur at the 
higher end through a self-selection or “reluct- 
ance” process. In other words, a larger propor- 
tion of men in the high-scoring end of the 
population than in the average range may 
themselves reject the position. 

Other troublesome factors may and often 
do occur in the business world. For example, 
in the selection of life insurance salesmen, the 
sales manager has LIAMA’s selection test (the 
Aptitude Index) score to aid him in his hiring 
decision (Paterson & Thompson, 1953). 
Neglecting the unsettling possibility that he 
may attempt to aid a “good-looking applicant” 
to pass the test, there still remains the un- 


fortunate fact that he proceeds to train and 
supervise the men whom he hires (including 
the “selection exceptions”) with full knowl- 
edge of the test prediction. How this may 
affect the time and energy he puts into an 
individual’s training or the decisions he makes 
about retaining or firing him is not known. 
However, it is obvious that effects which 
might spuriously lower or raise the validity 
estimate could occur. 

In recognition of these problems, LIAMA 
has, for some time, attempted to find life in- 
surance companies who would be willing to 
administer the Aptitude Index to all ap- 
plicants but refrain from using the scores in 
the selection process or revealing them to 
trainers or supervisors. (It should be noted 
that even this procedure does not remove the 
high-end, self-selection restriction referred to 
above.) These attempts have failed because 
the companies are understandably reluctant 
to incur the risk involved in contracting any 
considerable number of men whose chances 
for success are prohibitively low. This has 
been true even though the statement is fre- 
quently made that life insurance sales man- 
agers have “used the Aptitude Index so long 
that they can pretty well tell what score a 
man will get just by talking to him and will 
reject him, anyway.” 

Fortunately, one member company de- 
cided some time ago to run a validity check 
on the Aptitude Index as it was operating in 
its population (LIAMA, 1965). While the 
number of cases was small by LIAMA stand- 
ards, there was no gainsaying the fact that 
the results gave no support to the contention 
that the test had any predictive validity 
whatever. This company, therefore, decided to 
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TABLE 1 


APTITUDE INDEX ScoRE AND AGENT “Success” FoR Srx MonruHS 








“Classical” validity results 








A plitude 
index Number Percentage Number Percentage 
rating contracted contracted successful successful 
32-39 24 Sel 13 54.2 
29-31 38 23.9 13 34.2 
27-28 39 24.5 7 17.9 
25-26 40 252 6 15.0 
20-24 18 des 1 5.6 
Total 159 100.0 40 PIV) 


stop using the Aptitude Index in actual selec- 
tion but agreed to continue to administer it to 
all applicants. LIAMA scored the tests, and 
the scores were returned to the company but 
were not available to any operational per- 
sonnel. 

Table 1 shows the results. These men were 
all inexperienced in life insurance selling and 
were financed by the company in accordance 
with a standard procedure. All of them had 
an opportunity to survive for a 6-month 
period. “Success” was defined as survival for 
6 months and the earning of at least $700 in 
life insurance sales commissions. Note that 
61% of the men hired during this period ac- 
tually scored below the previous cutoff score— 
29. These managers did not identify and reject 
low-scoring men by “talking to them.” 

Remembering that an earlier study of the 
test in use had convinced this company that 
it had no validity and that larger samples 
studied by LIAMA (1963) have given only 
somewhat less gloomy if statistically sig- 


nificant results as shown in Table 2, we are 
led to wonder how often we may be tempted 
to discard instruments which are actually 
performing a useful predictive function be- 
cause we cannot study them under the proper 
circumstances, 

While direct curtailment has been gen- 
erally recognized as a problem, indirect cur- 
tailment has received much less attention. 
Also, there is a tendency to discuss the effect 
of improper use of tests on the apparent 
validity of a test, but little discussion of the 
effect of proper use of a test. The data re- 
ported in Table 1 could be, in part, an example 
of the effect on apparent validity of tests when 
a test is properly used with other selection 
tools in a total selection process. This phe- 
nomenon can be described as one type of 
indirect curtailment. 

Let us consider a greatly simplified example 
of this phenomenon, Assume that two tests 
are being given but not used in a company. 
For the applicant population, Test A has a 


TABLE 2 


APTITUDE INDEX Score anv AGENT “Success” ror SIx Montus 








Industrywide ‘‘in use” validity results 








A plitude I 
index Number Percentage Number Percentage 
rating contracted contracted successful successful 

32-39 1,161 44.6 356 30.7 
29-31 919 oro 248 27.0 
27-28 290 al 72 24.8 
25-26 154 5.9 34 22,1 
20-24 79 3.0 9 11.4 

Total 2,603 99.9 719 27.6 


——— eee 


Note.—Success for the industry was survival for 6 months and production in the top half of survivors in each company, 
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mean score of 100 and a standard deviation 
yf 20. Its correlation with the job success 
criterion is .50. Test B has the same mean, 
standard deviation, and validity coefficient as 
Test A. So, we have two equally valid tests. 
Assume further that the correlation between 
the two tests is zero. Under these conditions 
the company can simply add the two test 
scores together to get a total score with a 
mean of 200, a standard deviation of 28.3, 
and a validity coefficient of .707. Assume that 
we have a highly reliable criterion and a 
50% job success rate in our unselected popu- 
lation. Under ideal conditions, we could then 
use the regression line formula and the stand- 
ard error of estimate to state what percentage 
of the people at each test score level would 
fall above the 50% point on the job criterion 
scale and be considered ‘‘successful.” These 
estimates are presented in Table 3. 

Let us hastily add that in the real and more 
complicated world these expectations given in 
the table would not be realized because of 
various changing conditions in the situation 
and because not all of the ideal conditions 
would be found to hold true. However, the 
problem with which we are concerned in this 
discussion is essentially the same in the real 
world and in our simplified example. 

The company at this point decides that it 
can afford to reject the lower 50% of the 
applicants and so sets a total score of 200 
as the lowest acceptable score. This system 
is now put into use as the selection process. 
Some years later it is decided to check on 
the validity of Test A while in use. Scores on 
Test A were retained by the organization re- 
sponsible for Test A, but scores on Test B 
and the total test scores are not now avail- 
able, although the total test score was used 
in the hiring decision. Note what can be 
expected to happen to the success rates for 
scores on Test A under these conditions. 
Assuming that the validities and conditions 
have not changed, the only men hired scoring 
40 on Test A would be men who scored 160 
or more on Test B (a total score of 200 or 
more). For men who have a total score of 
200, the expected success rate is 50%. There- 
fore, among the men hired under the condi- 
tions described, even the lowest scoring group 


TABLE 3 
PERCENTAGE EXPECTED TO BE SUCCESSFUL POR IWAcH 
Test ALONE AND FOR THE Two Tests COMBINED 
AT VARIOUS SCORE LEVELS 








Test A 
Score plus 
level Test A Test_B Test B 
+3o 96% 96% 99.9% 
+2¢ 87% 87% 98% 
+o 712% 12% 84% 
M 50% 50% 50% 
—lo 28% 28% 16% 
—2« 13% 13% 2% 

00.1% 


—3¢ 4% 4% 





of men on Test A would be expected to have 
a success rate of 50% or higher. 

On the other hand, a man who scores 160 
on Test A (+3o on A) is virtually assured of 
getting a total score of 200 or higher since 
very few men will receive a score below 40 
on Test B (—3oc on B). Consequently, the 
expected success rate for the high scores on A 
does not change noticeably from that which 
we found in Table 3 in the column labeled 
Test A. Under these conditions, the range of 
expected success rates from 4% to 96% 
shown in Table 3 is reduced to a range of from 
50% to 96% as indicated above. In other 
words, the range we can expect to obtain 
under “in use” conditions is reduced to ap- 
proximately one-half of the original range. 
Although the test is still making the contribu- 
tion which it made originally, the “apparent 
validity” under “use” is much smaller than 
the validity found under the “give but don’t 
use” situation. This is an example of the phe- 
nomenon of indirect curtailment due to proper 
use of the test. 

Now let us return to the real world in which 
we operate. Because of many conditions which 
we will note only briefly here, we do not 
expect to experimentally obtain the range of 
success rates given in the table even in a 
“sive but don’t use” experiment. Examples 
of such conditions are the lack of highly reli- 
able criteria, changing conditions in a com- 
pany, lack of homoscedasticity (e.g., a tri- 
angular covariant surface instead of the 
classic oval), the fact that we do not operate 
around a 50% population success rate, and 
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the fact that unless we have unusually large 
samples, we seldom are able to look experi- 
mentally at a sufficiently large group of 
people who are more than one and one-half 
standard deviations away from the mean. 

Moving to the in use situation, in addition 
to the conditions mentioned above, we have 
the effect of proper use of the test, the effect 
of improper uses of the test, direct curtail- 
ment, and the reluctant applicant problem. 
Under these conditions, the difference in 
results shown in Table 1 and Table 2 does 
not seem particularly surprising. However, 
while proper use of the test in the situation 
described lowers only apparent validity, some 
types of improper use of the test lower both 
the apparent validity and the actual contribu- 
tion which the test makes in use. When we 
obtain a difference between validity in use 
and validity when not in use, our problem is 
to determine how much of the loss is only 
in the apparent validity and how much is a 
loss in the real contribution of the test. 

How difficult is it to obtain the information 
needed to do this? Frequently, the scores of 
other tests being used can be obtained and 
the interrelationships can be studied. How- 
ever, in some cases the scores are quite dif- 
ficult to obtain, particularly for the total 
applicant population. A difficulty that is more 
frequently encountered is that of determining 
exactly how the tests are being used and, 
therefore, how to make appropriate analyses. 
At the very least, we need to study the test 
score distribution characteristics on both the 
applicant population and the group hired 
from the applicant population if we are to 
get a reasonably clear picture of the contribu- 
tion or lack of contribution of the various 
selection tools in a process. 

Actually our most difficult situation from 
an analysis viewpoint is when a test is being 
used in conjunction with other selection tools 
which are not tests, such as the inspection 
report, physical examination, interviews, pre- 
contract training or job orientation (used as 
a selection device). To the extent that the 
use of such tools is not on a quantified and 
specified basis, any analysis made of the 
validity of a test ignores the interaction with 
them. Under such conditions, we are unaware 
of the amount and kind of indirect curtail- 


ment and may get a very false picture of the 
validity of a selection device. 

The problem of the test in use also applies 
where we are trying out new tests. Assume 
the situation which we had with Tests A and 
B in use and now add experimental Test C 
which is given but not used. After experimen- 
tation, we find that Test C is valid and we 
now wish to combine Tests A and C. Apply- 
ing any of our usual regression techniques 
routinely to the raw data for Tests A and C 
will automatically result in assigning weights 
to Test C which are too large in relation 
to the weights assigned to Test A. Similar 
analogies can be carried over to the problems 
of item analysis. How serious the error will 
be depends on the conditions. In the situation 
which we have described, the error would be 
reasonably large. 

What are some of the steps that can be 
taken with regard to these problems? Since 
experience has indicated that it is not safe 
to assume that a test has validity today be- 
cause it had validity several years ago, 
determination of current validity estimates 
remains a necessity. 

We can be sensitive to the number of cases 
required to do research. If a test is not in 
use, samples of 100, 200, or 300 cases are 
usually adequate for cross-validation, depend- 
ing upon the situation in which we are work- 
ing. On the other hand, with a test in use 
situation, we find that 2,000 to 3,000 cases 
are needed to get a reliable look at the situa- 
tion. This larger number of cases tends to 
average out the nonsystematic in use vari- 
ables which give troubie. The increase in 
cases, of course, does nothing to take care 
of the problem of a systematic variable such 
as indirect curtailment. 

Where conditions are appropriate for cor- 
relation analysis and sufficient information is 
available, Thorndike (1949) offers formulae 
for dealing with some of the direct and 
indirect curtailment situations. 

A possible solution when a test in use ap- 
pears to have lost much of its validity is to 
apply the classic give but don’t use experi- 
mental design. A company which has been 
using a test is ordinarily reluctant to do this. 
Consequently, this is a solution which has 
limited practical application in most indus- 
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rial situations o1ce a company has started 
ising the test. However, research workers 
hould probably put more emphasis upon this 
pproach than has been true in the past. 
jome companies are experimentally minded 
r can be educated in that direction by 
atient and alert research workers. 

In situations where tools other than tests 
lay a significant role in the selection process, 
he study of the process itself seems to be 
he only solution which promises real prog- 
ess. This means that it will be necessary to 
ollect data both on selection tools and the 
lecision processes which take place. Because 
f the many difficulties involved, we have 
voided a concentrated attack on this problem 


in the past. Perhaps the time has come when 
we must make a comprehensive attempt to 
study the overall selection process in order 
to understand what we are doing. 
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EFFECTS OF COMBINED COUNSELING AND VOCATIONAL 
TRAINING ON PERSONAL ADJUSTMENT? 


DANIEL GAVALES 


Multi-Occupational Youth Project, Houston, Texas 


To evaluate the effects of combined counseling and vocational training on 
personal adjustment, the Manson Evaluation test was administered to 85 
students during the 1st 2 wk. of training and again near termination. The 
students were between 17 and 21 yr. of age and were generally characterized 
by previous failure in social, academic, and vocational endeavors. All students 
and classes received regular individual and group counseling by skilled 
counselors. Comparisons of “before” and “after” Manson scores revealed 
consistent and highly significant gains in personal adjustment. The findings 
were interpreted within the framework of current governmental efforts to 
combat social ills such as poverty, delinquency, and unemployment. 


Recently, steps have been taken by the 
Federal Government to remedy such socio- 
economic ills as chronic unemployment and 
poverty. In the summer of 1964 the Multi- 
Occupational Youth Project (MOYP) was 
started in Houston, Texas as a pilot program 
designed to provide vocational training and 
counseling to unemployed or underemployed 
youth 16 through 21 years of age (Dunn, 
1965). The program is funded by the Fed- 
eral Government and represents a new ap- 
plication of the Manpower Development and 
Training Act (MDTA, 1962). In essence, it 
reflects some of the more current and novel 
dimensions of governmental intervention in 
dealing with long standing sociological prob- 
lems associated with unemployment and _ in- 
sufficient education. The laudable ambitions 
and goals of this large-scale program fall 
well within the framework of the “Great 
Society.” 

One of the unique aspects of this youth 
project is the inclusion of a comprehensive 
counseling program to supplement the train- 
ing process. Working through the Texas Edu- 
cation Agency and the Harris County Depart- 
ment of Education, the counseling program 


1 This research was undertaken as a partial evalu- 
ation of the Houston Multi-Occupational Youth 
Project. The counseling and training phase was 
conducted by the Texas Education Agency and the 
Harris County Department of Education. The author 
wishes to acknowledge the assistance of project 
counselors Paul Hubbard Lewis and Albert Luther 
Jackson in data collection. 
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is staffed by the author, representing clinical 
psychology, and four skilled counselors. All 
students in training receive frequent indi- 
vidual and group counseling geared at per- 
sonal, educational, and vocational rehabilita- 
tion, The crucial test of this program’s success 
is the extent to which the trainees reach a 
stage of adequate employability and personal 
adjustment, particularly in areas pertinent 
to obtaining and maintaining satisfactory 
employment. 

This study was specifically concerned with 
the examination of overall changes in per- 
sonal adjustment as a function of combined 
training and counseling. Toward this goal, 
the Manson Evaluation test (Manson, 1949) 
was administered to all students at the be- 
ginning of training and again near termina- 
tion, Although the Manson was originally 
designed to detect alcoholism, it covers a 
variety of psychoneurotic and psychopathic 
characteristics such as anxiety, depression, 
emotional sensitivity, resentfulness, incom- 
pleteness, aloneness, and interpersonal adjust- 
ment problems. Total scores can be considered 
as estimates of overall personal adjustment. 
Comparisons between “before” and “after” 
Manson total scores constitute the measures 
of personality change under investigation. 


METHOD 
Subjects 
The subjects (Ss) in this study were 85 young 


men and women between the ages of 17 and 21, 
All of the Ss were selected for training on the basis 
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tf their being unemployed, lacking in skills and/or 
nancial resources. A large portion of the sample 
ynsisted of youngsters from low socioeconomic strata 
id minority groups. The majority of Ss were 
igh-school dropouts, some completed high school 
id a few attended college. On the basis of screen- 
g procedures, such as counseling interviews and 
ie General Aptitude Test Battery, all the stu- 
ants accepted were found capable of learning and 
ere placed in vocational classes according to their 
ytitudes. : 
In psychological make-up, many of the students 
owed tendencies toward behavior disorders, some 
id strong sociopathic characteristics and a few 
vealed neurotic traits. Some of the more prevalent 
titudes and personality characteristics were rebel- 
yn against authority, bitterness, apathy, inter- 
rsonal antagonism, low motivation, low self- 
nfidence, and personal neglect as to dress and 
ooming. This total cluster of attributes has come 
be called the “chip on the shoulder syndrome” 
7 the professional staff in this program. 


raining 


Students comprising the sample for this study 
ere variously enrolled in the following vocational 
asses: clerk typist (two sections), offset printing, 
saning and pressing (two sections), and meat 
tting. Students attended classes from 6 to 8 hours 
day, 5 days a week for approximately 6 months. 
training allowance of $20 per week was given 
ch student and additional money was given to 
idents with head of household status. 


ounseling 


Under the joint coordination of the clinical psy- 
ologist and the director of training, counselors 
sre assigned to various classes with an approxi- 
ate load of 90 students per counselor. The Ss and 
isses under investigation here, constitute only a 
rtion of the total project. Each student received 
e€ or more individual counseling sessions per 
onth. The counseling process ranged from super- 
ial, supportive to depth counseling, depending 
on the demands of each case. Special emphasis 
is placed on helping the students develop traits 
lich are related to successful employment such as 
rsonal adjustment, cooperative behavior, relia- 
ity, dependability, honesty, loyalty, acceptance of 
thority, punctuality, good attendance, and good 
lowship. In view of the psychological nature of 
r Ss, achievement of these goals was regarded as 
major undertaking and challenge. It was felt 
it without substantial gains in personality charac- 
istics the benefits of training alone would not 
hieve the overall goal of converting idle, unem- 
yyable persons to a level of adequate employabil- 
. This study was undertaken to evaluate global 
ects of the training and counseling program upon 
rsonal adjustment factors. The question of employ- 
ility is indirectly related to measures of personal 
aracteristics and will require additional research. 


Testing 


The Manson Evaluation was administered to all 
Ss during the first 2 weeks of training and again 
during the last 2 weeks of training. The time span 
between first and second testing was approximately 
6 months. The standard directions, which appear 
on the face sheet of the test booklet, were used for 
the first administration. The following instructions 
were given during the second testing: 


I would like you to take this test again and 
answer the questions in terms of how you feel 
at this time. Some of your answers may be 
similar to the first testing and some may be 
different. Do not try to think back to your 
original answers. Pretend you are taking the test 
for the first time and answer according to how 
you feel now. 


Data Analysis 


Total Manson scores were obtained for all Ss 
along with scores for each of the seven personality 
traits covered by the test. Mean total scores were 
computed for each training class. A ¢ test for sig- 
nificance of difference between “before” and “after” 
mean total scores for all six classes combined was 
calculated. 


RESULTS 


Table 1 contains mean total scores for 
first (before) and second (after) testing and 
for each training class. The higher the score 
the greater the personality disturbance. The 


difference scores were obtained by subtracting 


the second testing mean from the first test- 


ing mean. Thus, a plus difference score sig- 
nifies a lower second testing score and conse- 
quently an implied increase in personal ad- 
justment. The obtained ¢ score of 5.47 was 
found significant (p< .01) 
overall lowering of scores on the second test- 
ing as compared to the first, and accordingly, 


indicating an 


TABLE 1 


MerAn MAnson Scores, DIFFERENCE 
SCORES AND T TEST 








First Second Difference 
testing testing scores 
MOYP classes (Before) (After) 1st-2nd 
Meat Cutting 44,27 30.87 +13.40 
Cleaning & Pressing, I 34.64 25.21 + 9.43 
Cleaning & Pressing, II 25.83 16.33 + 9.50 
Clerk Typist, Houston 30.13 24.66 + 5.47 
Clerk Typist, Prairie View 32.13 25.20 + 6.93 
Offset Printing 33.40 30.20 + 3.20 

t SAT |p < 201 
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DAB eZ, 
NUMBER OF STUDENTS SHOWING POSITIVE CHANGE AND 
NEGATIVE CHANGE ON SECOND MANSON TESTING 
AS COMPARED TO First TESTING 











Number Number Number 
of posi- of nega- having 
tive tive no 
MOYP classes changes changes change 
Meat Cutting 13 2 — 
Cleaning & 
Pressing, I i; 2 — 
Cleaning & 
Pressing, II 4 1 1 
Clerk Typist, 
Houston 11 2 Z 
Clerk Typist, 
Prairie View 10 3 2 
Offset Printers 11 9 — 
Totals 61 19 5 
Percentages 712% 22% 6% 





a Significant increase in measured personality 
adjustment. 

Table 2 shows the number of students in 
each class who made positive changes and 
the number who made negative changes on 
the second Manson testing as compared to 
the first testing. In all classes the number of 
positive changes, or plus difference scores, 
was consistently higher than the number of 
negative changes. Out of 85 Ss, 61 showed 
improvement on the second testing (72%), 
19 showed a negative change (22%), and 
5 showed no change (6%). Table 2 does 
not indicate the magnitude of the positive 
or negative changes; however, examination 
of the data reveals that the magnitude of 
the negative change was considerably smaller 
in each class than the magnitude of posi- 
tive change. These overall increases were 
consistently supported by the counselors’ 
impressions and reports of each student. 


DISCUSSION 


The results of the present study suggest 
significant increases in personal adjustment, 
defined in terms of Manson scores, as a 
function of combined counseling and voca- 
tional training. The obtained Manson scores, 
however, must be regarded with caution and 
considered only as guidelines to interpreta- 
tion in view of the limited and often ques- 


tionable reliability of such personality inven-_ 
tories. An inference that can be justifiably 
drawn from the data is that the students 
were able to present themselves in a more 
favorable light near the termination of train- 
ing as compared to the early stages of 
training. 

The impression of apparent improvement 
in self-portrayal was given considerable sup- 
port by the counselors’ chronological progress 
reports which depicted steady improvement in 
most of the students in areas such as atti- 
tudes toward the self, attitudes toward au- 
thority, self-confidence, learning and work 
habits, interpersonal relationships, and phys- 
ical appearance. In a number of individual 
cases the positive changes in personality were 
very dramatic and would be readily notice- 
able to anyone familiar with the youngster 
over a period of time. It is felt that develop- 
ment in these personal areas will have con- 
siderable bearing on the students’ success in 
obtaining and maintaining satisfactory em- 
ployment, particularly in view of the many 
characterological limitations they demon- 
strated upon entry to the program, In this 
regard, recent follow-up studies of initial 
classes graduated revealed an 82% employ-. 
ment rate with an average monthly salary 
increment of $72 over salaries received in jobs 
prior to training. | 

It is questionable whether the same degree 
of positive personality change would have 
occurred if any of the independent variables 
such as counseling, training, and allowances, 
were isolated. However, in further studies it: 
would be beneficial to establish control groups 
so that the relative effects of each of these 
variables may be ascertained. Also, attempts | 
can be made to examine the relative modifica- 
tions of various personality characteristics as_ 
a function of combined counseling and train-- 
ing. The methodological approach of investi- 
gating the effects of combined variables em- | 
ployed in this research seems to be in line 
with current theoretical formulations which 
emphasize the importance of interactive 
and multivariate processes in most behavioral 
phenomena. 

Three major factors seem to underly the 
positive personality changes observed in this 
study. In order of estimated importance, they | 
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e: (a) intensive individual and group 
unseling, (5) acquisition of significant 
ills through vocational training, (c) receiv- 
x financial assistance while in training. 
me of the beneficial effects of counseling 
ay have resulted from the counselor’s unique 
sition of serving as a parent surrogate. This 
comes more important when one considers 
e obvious lack or insufficiency of family 
‘ucture in the background of many of the 
idents in this study. Furthermore, studies 
ve shown that increased involvement and 
mmunication between parents and _ their 
‘spring results in a higher probability of 
ccess in training or on the job (Feinberg, 
64). The actual training process may have 
ntributed by increasing self-confidence, ex- 
sing the student to solid community models 
d placing the youngster in a more stable 
d purposeful social group than he is likely 
find on the street corner. Receiving weekly 
owances may have served as a reinforce- 
ant of positive behavior, particularly since 
idents were docked for unexcused absences. 
Closer examination of the psychological 
ake-up of students in this project reveals 
at before training they were generally lack- 
x in social skills and had negative attitudes 
ward learning, work and authority figures. 
many instances, personal qualities such as 
athy, lethargy, rebellion, and bitterness 
ward society were responsible for the young- 
ars’ state of insufficient skills and employ- 
ent. These conditions, combined in many 
ses with financial and sociocultural! limita- 
ms, served to markedly restrict their devel- 
ment and realization of native capacities. 
general quality which seemed to character- 
2 most of the youngsters during the early 
ages of training was a feeling of “unimpor- 
nce.” This, in combination with a social 
rception of limited or nonexistent oppor- 
nity for success may in part account for 
e deterioration and delinquency of many 
w socioeconomic groups which continue 
eir struggle unassisted. 

It is the contention of Cloward and Ohlin 


(1960) that juvenile delinquency is related 
to a discrepancy between the individual’s 
perceived goals and his perception of limited 
opportunity available to him, particularly if 
he comes from the low social strata. In effect, 
the Multi-Occupational Youth Project repre- 
sents an ambitious effort on the part of the 
Government to directly extend opportunity 
to poverished and disadvantaged youth so 
that their aspirations can be realized. It will 
be interesting to see how this equalization of 
opportunity will affect sociological trends, 
particularly in the deprived subcultures. 

The positive findings in this study regard- 
ing personality change add support to the 
intentions of the Youth Project. With the 
increasing trend and demand for the extension 
and diversification of psychological and coun- 
seling services so that more persons can be 
reached, this study has demonstrated a mul- 
tiple impact training and counseling approach 
to the problem of personal and vocational 
rehabilitaiton. Further investigations will be 
needed to more specifically examine the vari- 
ous elements in counseling and training in 
terms of their effect on particular personality 
characteristics and also how they may affect 
one’s level of employability. 
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Solutions of 1000 and 2000 ppm (mg. per liter) of NasSOu, NaHCOs, CaSQu,, 
MgSQ;, NaCl, CaCl, MgCle, and NasCOs were rated on an acceptability scale 
in 3 separate studies. Results showed that the minerals ranked in acceptability 
approximately as listed. Implications of the findings were discussed mainly in 
regard to detection thresholds for the 8 minerals and consumer acceptance of 
naturally mineralized ground waters used for domestic supplies. 


In arid and semiarid parts of the United 
States ground waters often contain, in solu- 
tion, relatively high concentrations of the 
common minerals: calcium, magnesium, and 
sodium, in combination with chloride, sulfate, 
and carbonate. The common dissolved min- 
erals can impart a definite taste to ground 
waters used for community supplies; however, 
none produce odor or turbidity. Very little is 
known regarding the acceptability to con- 
sumers of waters containing varying relative 
proportions and total amounts of dissolved 
common minerals. Some research has been per- 
formed in an attempt to determine lower abso- 
lute thresholds for certain of the cation-anion 
combinations (Cox, Nathans, & Vonau, 1955; 
Lockhart, Tucker, & Merritt, 1955; Whipple, 
1907). These researches give no direct indi- 
cation of the acceptability of the taste pro- 
duced by supraliminal amounts of various 
dissolved minerals to individuals who daily 
consume a water having either a low or high 
concentration of total dissolved minerals. 

Recently a program of research has begun 
in order to determine consumer acceptance of 
high mineral waters in California (Bruvold 
& Gaffey, 1965; Ongerth, Bruvold, & Knut- 
son, 1964). A part of this research program 
has involved the development of a procedure 
whereby individuals rate samples of water 


1The research reported in this paper was sup- 
ported in part by Fellowship Award WPSP-16,452(1) 
from the Department of Health, Education, and 
Welfare and in part by a grant from the California 
State Department of Public Health. 


containing known amounts of a mineral solute 
under controlled laboratory conditions using 
a taste scale. The taste scale consists of a 
number of ordered statements describing the 
taste of water and its acceptability for daily 
consumption. The rating procedure employing 
the taste scale bears an analogy to the psycho- 
physical method of single stimuli used by 
Bayton and Thomas (1954) and by Clements, 
Bayton, and Bell (1954) in the study of the 
acceptability of canned orange juices. 

The present paper reports results from 
three studies employing a taste scale rating 
procedure. One purpose of these studies was 
to refine the procedure whereby the subjects 
(Ss) rate samples of mineralized water. A 
second purpose was to begin to formulate the 
relation between certain concentrations of 
various mineral solutes and rated acceptabil- 
ity for daily consumption. 


Stupy I 
Method 


Solutions. Nine mineral solutes were employed to 
prepare 17 solutions for use in Study I. Each solute 
was composed of a cation and an anion from those 
which compose the common minerals. Solutions of 
1,000 and 2,000 ppm (mg. per liter) were prepared 
in distilled water using reagent grade chemicals as 
solutes. CaCle, CaSOs, MgCl, MgCOs, MgSO., NaCl, 
NazCOs, and NasSOx were employed separately to 
produce 8 solutions at each level of concentration. 
If a solute was hydrated, a correction for the hy- 
dration factor was made in preparing the solution. 
Also, corrections for the small amounts of impurities 
present in the reagent grade chemicals were made. 
To put the MgCOs into full solution, 1 milliliter of 
concentrated HCl was added to both the 1,000 and 


MINERAL TASTE IN WATER 23 


1e 2,000 ppm solution. One solution of 15 ppm 
aCOs was prepared. After all solutions had been 
repared, a check on the correctness of solution 
yncentration was made for each of the 17 samples. 
he errors were found to be less than 1% for each 
lution. Berkeley tap water, provided by the East 
ay Municipal Utility District (EBMUD), was 
sed as the eighteenth sample in Study I. This water 
mitains approximately 85 ppm of total dissolved 
lids (TDS), and it is free from noticeable odor 
id mineral taste. 

Subjects. There were 17 Ss in Study I. Ten of 
ese were males and 7 were females. The ages of 
1e Ss ranged from 21 to 53 years. All of the male 
s and 1 female S$ were engineers with the Bureau 
f Sanitary Engineering of the California State 
lepartment of Public Health in Berkeley, California. 
ix of the female Ss were employed as secretaries or 
enographers in the same Bureau. All of the Ss had 
‘ceived EBMUD water at their place of residence 
yr the 6 months just prior to Study I. Several of 
1e Ss had participated in an earlier study dealing 
ith the rated intensity of mineral taste in water. 
owever, no S was involved in the sensory analysis 
f water supplies, and none had received any special 
aining in sensory analysis. 

Instructions, Introductory instructions describing 
le general nature of the experiment were read 
) each S at the beginning of the first experimental 
ssion. The specific instructions given to each S 
_ the beginning of each experimental session are 
ven below: 


1. Take 1 mouthful of the first sample and 
hold it long enough to get a really good taste of 
the sample. Spit the sample into the sink. Do not 
swallow the sample. Wait for a few seconds to 
evaluate any after-taste that may be present. 

2. From the 12 items that comprise the taste 
scale, select 3 or 4 that seem to fit your initial 
reaction most closely. 

3. Take another mouthful of the sample. Spit it 
into the sink after you have had enough time to 
taste it fully. Evaluate any after-taste. 

4. Study the 3 or 4 previously selected items 
and attempt to choose the statement that represents 
your rating most closely. 

5. Take a last mouthful of the sample, taste 
it carefully, spit it into the sink and evaluate any 
after-taste. 

6. Select the statement that represents your 
rating most accurately. Report this rating to me. 

Rinse your mouth thoroughly with tap water 
until the after-taste is removed. Wait 1 minute 
before taking the next sample. The waiting period 
will be timed by the laboratory clock. 

Repeat Steps 1 through 6 for the second sample, 
rinse and wait 1 minute. Then repeat the procedure 
until all solutions have been rated. 


Procedure. The experiment was conducted in a 
aall air-conditioned room whose temperature was 
mnstantly kept very near 70° F. The samples 


to be rated were presented in 100-milliliter beakers, 
filled to the 75-milliliter level. The order of presenta- 
tion of the 18 samples was determined by chance. 
A laboratory sink was used for sample expectoration. 
Before the S arrived for the rating session, E poured 
the 18 samples into beakers coded with single letters 
and placed them on a laboratory table in a randomly 
determined order. The samples were presented at 
room temperature. EBMUD water was used for 
oral rinsing. 

After the instructions were given, the S was put 
through the rating procedure by E. The E sat 
behind and to the right of S and was silent during 
the time S was tasting and rating the samples. 
Each S$ was put through 2 complete rating sessions 
in which all 18 samples were rated. The 2 sessions 
were separated by an interval of about 30 days. 
The procedure used in the second session was 
identical to that used in the first session. Upon 
finishing the second session, each S completed an 
evaluation questionnaire dealing with the nature of 
the rating scale and the procedures used. 

Taste scale. The taste scale used in Study I was 
composed of 12 separate statements. Each state- 
ment first described the taste of water in general 
terms, and then indicated that the water was ac- 
ceptable or unacceptable for daily consumption. The 
12 statements were ordered so that the first in- 
dicated that the water tasted especially good and 
was acceptable for daily drinking, while the last in- 
dicated that the water tasted very bad and would 
never be accepted as drinking water. The remaining 
10 statements were worded to produce ordered rating 
points between the end points of the scale. 


Results 


Two major analyses of the results from 
Study I were performed. The first analysis 
was intended to reveal something of the qual- 
ity and reliability of the rating procedure. 
The second analysis was directed at analyzing 
differences between the ratings given the 8 
minerals at both levels of concentration. 

Guilford (1954) discusses 6 types of errors 
that can plague responses obtained from rat- 
ing scales. Of these, the error of leniency and 
the error of central tendency appear to pose 
the most serious problems for the taste scale 
rating procedure. To obtain an indication of 
the presence of possible leniency errors, the 
mean of all 18 ratings given by each S during 
each rating session was computed. To obtain 
an indication of the presence of possible errors 
of central tendency, the range of all ratings 
given by each S during each rating session was 
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also computed. The range was defined as the 
highest rating given, minus the lowest rating 
given, plus 1. Stated differently, the range was 
the number of points spanned by the ratings 

















M Ranges of a particular S, with the extreme ratings 
aera py included. The means and ranges for Study I 
ject Soni nt Ts Math PR OR are shown in Table 1. 
To obtain an indication of the reliability of 
1 Ott “30:07 0-30 dtuel bale md the taste-scale procedure, a Pearson correla- 
2 ‘90 6.06 6.22 11 il tion coefficient was computed for the ratings 
3 so a ae eee ee given during the first and second sessions 
: 5, a2 oa 4 4 separately for each S. These coefficients de- 
6 A ne rer ten scribe the correspondence between the first 
7 ‘8 wrosm 1630 Mandiae lad and second ratings given the same solutions 
8 od inig soeanG GTA by a single S, and are not the usual type of 
9 PE MG Gl 11 10 coefficients involving correspondence between 
10 76 7.06 7.06 9 9 pairs of scores obtained from the repeated 
11 (6.0 SOM wold ape lOp a1 0 testing of a group of individuals. The correla- 
12 149 3.89 975.89 2 lO tion coefficients appear in Table 1. 
13 65 606 761 it 11 To study differences between the mean 
ie es oe es - i ratings given the various minerals, and to 
ae ee ae a BF a study differences between mean ratings for 
7 Coe ee ok the 1,000 and 2,000 ppm samples, a 4-way 
analysis of variance was performed on all 
DABLER2 
ANALYSIS OF VARIANCE: Stupres I anv IL 
Study I Study II 
Source df MS F df MS F 
Minerals (A) 7 228.71 39.64** 7 414.81 32.69** 
Concentrations (B) 1 817.81 299.56** 1 349.06 36.02** 
Subjects (C) 16 24.00 16 195.41 
Replications (D) 1 18.01 5.89* + 11-73 0.96 
AXB 7 74.42 SieO1e* 7 17.12 Shoe 
AX |C 112 Sala 112 12.69 
AX D 4.10 de 72 28 3.85 (23 
iB xa 16 2.73 16 9.69 
BXD 1 0.54 0.37 4 2.07 0.73 
COD 16 3.06 64 12.22 
eG Dee 112 2.40 112 2.96 
AXBXD 7 1.72 O. asta i Doe eens 17 0.97 
Ae Gon 112 2.38 448 3.14 
BX CD 16 1.45 64 2.82 
IK ASK (COX, 1B) 112 2.06 448 3.26 


*b <.05. 
Hp} < 01. 
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Fic. 1. Minerals by concentrations interaction for Study I. 
atings except those for CaCO 3; and NaHCOs was added to the series. The remaining 7 


‘BMUD. Minerals, Ss, concentrations, and 
splications constituted the 4 factors of the 
esign, and there was 1 rating per cell. The 
‘ ratios were computed following the stipu- 
tions of McNemar (1962) for a 4-way 
ixed model with Ss the only random vari- 
ble. A summary of this analysis is presented 
1 Table 2. A graphical presentation of the 
inerals by concentrations interaction is 
hown in Figure 1. 


Stupy II 
design 


Study II was performed in order to check the 
sproducibility of the differences in mean ratings 
iven the various minerals at the 2 concentrations 
mployed in Study I. A significant difference be- 
ween mean ratings given the 1,000 and 2,000 ppm 
umples had been expected; however, significant 
ifferences among mean ratings given the 8 mineral 
alutes were not anticipated. Significantly different 
1ean ratings given various minerals at the same 
vel of concentration have important implications 
Ir consumer acceptance of naturally mineralized 
raters and for the determination of permissible 
‘vels of mineralization in waters used for domestic 
onsumption. 

Solutions. Eight mineral solutes were used to 
repare 16 solutions for Study II. MgCO; and 
‘aCOs were not used because of low solubility. 


mineral solutes were the same as in Study I. For 
each mineral, 2 solutions were prepared; 1 at 1,000 
ppm and 1 at 2,000 ppm. The methods of preparation 
of the solutions were the same as in Study I. In ad- 
dition to the 16 solutions of mineral salts, the Ss 
in Study II rated tap water from Davis, California, 
and distilled water. Davis tap water ranges from 570 
to 940 ppm TDS, with the median nearer 600 than 
900 ppm TDS. The distilled water and Davis water 
samples were not included in the major analysis of 
results. 

Subjects. There were 17 Ss in Study II. Ten Ss 
were males and 7 were females. The ages of the 
Ss ranged from 20 to 52 years. Five of the males 
were graduate students at the University of California 
at Davis, and 5 were employees of the same institu- 
tion. All of the female Ss were employees of the 
University. All Ss had been receiving Davis water 
at their place of residence for 2 years prior to 
Study II; except for 1 S who had been receiving 
Davis water for a period of 4 months just prior to 
participating in the research. In contrast to the Ss 
used in Studies I and III, the Ss of Study II were 
experienced in sensory analysis. Although length 
of experience and degree of training varied con- 
siderably for various members of the sample, all 
were rather sophisticated as judges in the sensory 
evaluation of foods and beverages. 

Instructions. The instructions of Study II were 
very similar to those used in Study I. The rating 
procedures employed were the same except that the 
S recorded his own response on a data sheet and the 
rest interval between individual samples was reduced 
from 1 minute to 30 seconds. Appropriate changes in 
the wording of the instructions were made whenever 
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necessary in order to make the instructions ap- 
propriate to the situation of no £ present during the 
rating sessions. A complete set of instructions was 
presented during each rating session. 

Procedure. The Ss sat in individual partitioned 
booths in a specially designed laboratory room 
maintained at 70° + 1° F. No E was present in 
the test room. Coded samples of mineralized water 
were presented at room temperature in random- 
ized order in the same amounts as in Study I. On 
each of 5 consecutive days an entire set of 18 
samples was presented to S through a sliding 
door in the front of the individual booth. A beaker 
containing approximately 250 milliliters of dis- 
tilled water was also presented to S for use in oral 
rinsing. Waxed cardboard containers were used 
as receptacles for expectoration, as Ss were in- 
structed not to swallow the samples. After finishing 
the last rating, S completed a questionnaire similar 
to that used in Study I. 

Taste scale. The rating scale used in Study II was 
very similar to the scale used in Study I. Two 
statements were added at the upper end of the new 
scale. Also, changes in the wording of several of the 
original statements were made in order to over- 
come difficulties mentioned by the Ss of Study I 
in the evaluation questionnaire. 


Results 


The major analysis of the data of Study II 
was in the form of a 4-way analysis of vari- 
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ance involving the same factors and F ratios 
as the first analysis. The results of the second 
analysis of variance are presented in Table 2. 
The minerals by concentrations interaction is 
pictured graphically in Figure 2. 


Stupy III 
Design 


Study III was performed in order to obtain 
ratings for each of the 16 samples listed in Figure 2 
from a sizable number of individuals who consume a 
low mineral water. 

Solutions. Sixteen solutions were prepared for 
use in Study III. The solutions were the same as in 
Study II, and the methods of preparation were the 
same as in Study I. 

Subjects. There were 56 Ss in Study III. Thirty- 
one Ss were males and 25 were females, The ages 
of the Ss ranged from 19 to 64 years. All of the Ss 
were employees of the California State Department 
of Public Health in Berkeley, California, and all 
had been receiving EBMUD water at their place of 
residence for at least 1 year prior to the experiment. 
No S had any previous experience in research 
evaluating the flavor of water. 

Instructions. Except for certain minor revisions 
and additions, the instructions of Study III were 
identical to those of Study I. 

Procedure. Study III was conducted in the same 
experimental room, using the same general techniques 
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und procedures employed in Study I. Sample solutions 
were presented at room temperature. Coded beakers 
nolding 100 milliliters were filled to the 75-milliliter 
level and randomly ordered before S arrived for the 
rating session. After the instructions were given, 
E sat behind and to the right of S and was silent 
while sample evaluation was in progress. E recorded 
on a data sheet the ratings reported by S. Ratings 
were obtained only for the 16 critical sample solu- 
ions; neither EBMUD water nor distilled water 
was included in the sample series); EBMUD water 
was used for oral rinsing. One minute of rest 
preceded each sample evaluation. Only 1 rating 
session was completed by each S. No evaluation 
questionnaire was obtained at the end of the rating 
session. 

Taste scale. The taste scale used in Study III, 
sxcept for several minor revisions, was the same as 
the taste scale used in Study II. The Study II 
taste scale is presented below: 


1. This water TASTES REAL Goop. I would be 
VERY HAPPY TO HAVE 1T for my everyday drinking 
water. 

2. This water TASTES Goop. I would be HAPPY TO 
HAVE IT for my everyday drinking water. 

3. This water has NO SPECIAL TASTE at all. I 
would be HAPPY TO HAVE 1T for my everyday 
drinking water. 

4. This water seems to have a LITTLE TASTE. 
I would be SATISFIED TO HAVE IT as my everyday 
drinking water. 

5. This water seems to have a MILD OFF TASTE, 
I would be SATISFIED TO HAVE IT as my everyday 
drinking water. 

6. This water has an orF TASTE. I COULD ACCEPT 
IT as my everyday drinking water. 

7. This water has a MID BAD TASTE. I COULD 
ACCEPT IT as my everyday drinking water. 

8. This water has a FAIRLY BAD TASTE. I THINK I 
COULD ACCEPT IT as my everyday drinking water. 

9. This water has a BAD TASTE. I DON’T THINK 
I COULD ACCEPT IT as my everyday drinking water. 

10. This water has a BAD TASTE. I COULD NOT 
ACCEPT IT as my everday drinking water, but I 
could drink it in an emergency. 

11. This water has a REAL BAD TASTE. I would 
drink it ONLY IN A SERIOUS EMERGENCY. 

12. This water has a REAL BAD TASTE. I DON’T 
THINK I WOULD EVER DRINK IT. 

13. This water has a TERRIBLE TASTE. I would 
NEVER DRINK IT. 

14. This water has a TERRIBLE STRONG TASTE. 
TI CAN’T STAND IT in my mouth. 


Results 


The data of Study III were analyzed by 
means of a 3-way analysis of variance with 
minerals, Ss, and concentrations constituting 


TABLE 3 
ANALYSIS OF VARIANCE: Srupy III 














Source df MS F 
Minerals (A) 7 357.62 60.82** 
Concentrations (B) 1 OOD Om Looslon™ 
Subjects (C) 55 40.31 

AXB 7 32.69 11.43** 
OX, © 385 5.88 
BE<GC 55 4.78 
DBP GC 385 2.86 
** y < 01, 


the factors of the design. The F ratios were 
computed following the stipulations of Mc- 
Nemar’s Case XII. The results of the analysis 
of variance are presented in Table 3, and the 
minerals by concentrations interaction is pre- 
sented in Figure 2. 


DISCUSSION 


With regard to the presence of leniency er- 
rors (Guilford, 1954) in the rating procedure, 
it should be noted that the hypothetical mid- 
point on the rating scale was 6.5 for Study I. 
Twenty-eight of the 34 means shown in Table 
1 were less than 1 point removed, and 32 were 
less than 2 points removed from the hypo- 
thetical mid-point. Similar analyses made 
upon the data from Studies II and III indi- 
cate that most Ss had a mean very near hypo- 
thetical mid-point of the 14-point taste scale. 
Such results argue strongly that the Ss of 
these 3 studies were not responding artificially 
to the samples with a set or predisposition to 
rate all samples as being either completely 
acceptable or completely unacceptable for 
daily consumption. 

That the means of the three studies are 
usually close to the hypothetical mid-point 
of the rating scale does not argue against the 
presence of a central tendency error (Guil- 
ford, 1954). However, the ranges of ratings 
shown in Table 1 dispel the notion that the 
Ss of Study I rated all samples with a state- 
ment near the middle of the scale because of 


28 WILLIAM H. BRUVOLD AND RosE MArtIE PANGBORN 


a set or predisposition to avoid the assignment 
of ratings near either end point. Inspection of 
the ranges of ratings in Studies II and III 
yields the same conclusion. Taken together, 
the ranges and means for the three studies 
indicate that leniency and central tendency 
errors did not pose serious problems to the 
rating scale procedure employed. It should 
be stated that while the absence of these er- 
rors is necessary for a valid rating procedure, 
their absence is not sufficient to demonstrate 
the fundamental validity of the rating 
procedure. 

The correlation coefficients presented in 
Table 1 show, in general, that the Ss of Study 
I gave similar ratings to each of the 18 sam- 
ples during both rating sessions. As a further 
aid to interpreting the coefficients of Table 1, 
it ought to be pointed out that approximately 
1 month intervened between the first and the 
second rating session, that the Ss had no idea 
that they were rating the same samples during 
the second session, that a different random 
order of presentation of samples was used in 
each rating session, that all Ss were unsophis- 
ticated in procedures of sensory analysis, and 
that the ratings could vary only over the rela- 
tively narrow range of 12 points. Considering 
these factors, the reliability of the ratings of 
those Ss for whom the coefficients were above 
.80 was remarkable. Ss receiving a reliability 
coefficient lower than .70 began to show 
enough disagreement in rating the same sam- 
ples to cause concern over whether these Ss 
were concentratintg on the task of rating, 
guessing because of poor taste sensitivity, or 
confused or distracted by the particular pro- 
cedures used. Studies II and III were not 
designed to measure the reliability of ratings. 

The two levels of concentration employed 
in three studies were chosen to test, in part, 
the accuracy or validity of the rating proce- 
dure. It was assumed that Ss accustomed to 
waters having little or no mineral taste would 
object to mineral taste in water and, further, 
that there should be an increasing functional 


relationship between amount of mineral in 
water and the rating given on the taste scale. 
Accepting these two assumptions, it follows 
that 2,000 ppm samples of mineralized water 
should be rated as more unacceptable than 
samples containing 1,000 ppm of the same 
solute. In fact, the two assumptions appear so 
obvious that the rating procedure could be 
deemed invalid and inaccurate if the 2,000 
ppm samples did not receive significantly 
higher ratings than the 1,000 ppm samples. 

The F ratios for concentrations from all 
three analyses of variance are well above the 
levels conventionally required to claim sig- 
nificance. Such a result may be held to sup- 
port the validity and accuracy of the rating 
scale procedure and also the common sense 
notion that a 2,000 ppm sample will have a 
more unacceptable taste to a person accus- 
tomed to a low mineral water than will a 
1,000 ppm sample containing the same min- 
eral solute. 

The difference in mean rating given each 
mineral solute at the two concentration levels 
is shown in Figures 1 and 2 and in Table 4. 
It should be noted that in all studies the 
higher mean rating was always associated 
with the higher concentration level. There is 
some question raised by the mean rating ob- 
tained by the 2,000 ppm sample of CaSO, in 
Study I. In spite of the care taken in prepar- 
ing and presenting samples, it appears that 
some kind of error occurred with this sample 
and the results should probably be discounted. 

Mention should be made of the fact that 
the Ss of Studies IT and III have higher rat- 
ings than did the Ss of Study I. Although it 
would be improper to definitely attribute the 
difference in ratings to any factor or set of 
factors, the 12-point rating scale employed in 
Study I appears to be the cause of the lower 
ratings obtained in that research. 

The analyses of variance performed on the 
results of the three studies indicate that the 
mean ratings associated with the several min- 
eral solutes differ significantly. As was the 
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TABLE 4 


RANKS OF MEAN RATINGS FOR SOLUTES EMPLOYED IN THE TUREE STUDIES 





Study I Study II Study IIT 
Rank aa 
order 1,000 ppm 2,000 ppm 1,000 ppm 2,000 ppm 1,000 ppm 2,000 ppm 
1 CaSO. MgsO. NaHCO; NaHCO; Na2SO4 NaHCO; 
(3.76) (4.24) (6.00) (6.94) (4.95) (5.62) 
2 MgSO, NaeSO4 MgSO, CaSO. CaSO. MgSO, 
(3.76) (4.91) (6.32) (7.00) (5.25) (6.25) 
3 NaSO. NaCl NaeSO. MgSO. NaHCO; NaeSO. 
(3.94) (6.32) (6.55) (7.06) (5.48) (6.50) 
4 NaCl CaCl CaSO. Na2SO. MgSO, CaSO 
(4.38) (7.41) (6.98) (7.24) (5.96) (7.48) 
5 CaCl, MgCl. NaCl NaCl NaCl NaCl 
(4.85) (7.68) (7.15) (8.22) (6.48) (8.50) 
6 MgCl, Na2CO; MgCl. CaCle CaCl: CaCls 
(5.50) (9.85) (8.28) (10.16) (6.55) (9.96) 
7 NazCO3; CaCl MgCl. MgCle MgCl: 
(8.09) (8.36) (10.29) (7.82) (10.16) 
8 NaeCO; NazCOe Na2CO; NasCO3 
(10.31) (11.14) (9.68) (11.48) 





case with concentration level, the F ratios for 
minerals are well above the levels usually re- 
quired for the rejection of the hypothesis that 
chance produced the differences observed. The 
size of the F ratios and the substantiation of 
the findings of Study I by Studies II and III 
serve to dispel doubts concerning the reality 
of differences in mean ratings associated with 
the various minerals. Further, it ought to be 
pointed out that for all studies the difference 
between the lowest and highest mean rating 
for each concentration level is of practical 
significance. The lowest mean ratings given 
indicate that some samples could be accepted 
for everyday drinking. The highest mean rat- 
ings given indicate that several samples could 
not be accepted for daily drinking. There- 
fore, it becomes clear that some of the com- 
mon minerals impart a much more objection- 
able taste, at the concentration levels em- 
ployed in these studies, than do others. It 
would then follow that the acceptability of the 
taste of naturally mineralized waters would be 
a function of the kind of minerals dissolved 


in the water as well as the total amount of 
mineral present. It is conceivable that certain 
waters at 1,000 ppm might have only a mildly 
objectionable taste and thus could be accepted 
for daily consumption, while other waters 
containing the same amount of dissolved min- 
eral content would have a strongly repugnant 
taste and thus would be totally unacceptable 
for daily consumption. 

Table 4 contains the mean ratings given the 
various mineral solutes at each level of con- 
centration ranked in order from low to high. 
MgCOs was not included for Study I because 
HCl was added to fully dissolve the solute. 
Also, the mean rating for CaSO, at 2,000 ppm 
was not included for Study I because of a 
probable error in solution preparation. Each 
mean for Study I is based upon 34 ratings. 
The means for Study II are each based upon 
85 ratings. The means for Study III are based 
upon 56 separate ratings. 

A careful inspection of the data in Table 4 
reveals considerable consistency in mean rat- 
ings for Studies II and III, and in the ranks 
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of mean ratings across all three studies. 
NaHCOs, Na2SO4, CaSO4, and MgSO, always 
receive the lower ratings. NaCl seems to form 
a dividing line between the mild and strong 
tasting solutes. CaCle and MgCle are always 
rated less acceptable than NaCl, while NasCO3 
produces the most unacceptable taste of any 
solute studied. The consistency in mean rat- 
ings and ranks of mean ratings adds substance 
to the finding that the differences between 
mean ratings for minerals are statistically sig- 
nificant by indicating which solutes produce 
the more acceptable and the less acceptable 
flavors. 

If the assumption is made that rated un- 
acceptability increases in the same fashion 
for each of the common minerals as a function 
of ppm of solute above the detection thresh- 
old, it then follows that the minerals receiving 
the highest ratings on the taste scale should 
have the lowest detection thresholds, and vice 
versa, when thresholds are reported as ppm 
of solute necessary to produce a reliable de- 
tection in a distilled water base. To check the 
accuracy of the notion that there may well 


be an inverse relation between rating on the 
test scale and detection thresholds, three re- 
searchers (Cox et al., 1955; Lockhart et al., 
1955; Whipple, 1907) were reviewed in which 
detection thresholds had been determined for 
common minerals solutes. In each study single 
common minerals, a cation in combination 
with an anion, were used as solutes in distilled 
water. Thus, the part of the procedure regard- 
ing solutes employed in the threshold studies 
matched the approach used in the three stud- 
ies reported in this paper. Two of the studies 
(Lockhart et al., 1955; Whipple, 1907) used 
a direct measure of the weight of the solute 
in ppm. Cox et al. (1955) used molar weights 
of the various solutes. The molar weights were 
converted into equivalent ppm for presenta- 
tion in this paper. 

Substantial differences exist between the 
methods used in the three studies to obtain 
the detection threshold. Whipple (1907) ap- 
parently used the classical method of limits 
with a single ascending trial for each S$ with 
each mineral solute. Whipple reports no single 
mean or median threshold figure. To obtain 


TABLE 5 


DETECTION THRESHOLDS IN PARTS PER MILLION FOR ComMMON MINERAL SOLUTES 
IN DisTILLED WATER 


Whipple (1907) 


Cox et al. (1955) 


Lockhart et al. (1955) 





Rank Number Mineral Number Mineral Number Mineral 
order Subjects threshold Subjects threshold Subjects threshold 
1 19 MgCl 8 NaeSO4 20 NaHCO; 
(575.0) (92.0) (1060.0) 
2 16 CaSO. 31 NaCl 20 MgSO, 
(550.0) (40.9) (500.0) 
3 14 MgSO, 31 CaCle 20 CaCle 
(531.3) (14.4) (347.0) 
4 14 NaSO.4 SH MegCle 20 NaCl 
(368.8) (12.7) (345.0) 
5 17 NaCl 20 NazCOs3 
‘ (317.9) (78.0) 
6 17 CaCle 
(239.3) 
7 13 NazCO; 


(27.5) 
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single detection threshold, the median of 
1e several individual thresholds was deter- 
ined separately for each solute using the 
ymputational procedure outlined by Blom- 
ers and Lindquist (1960). Cox et al. (1955) 
eveloped a variant on the method of limits 
here Ss who had demonstrated special dis- 
‘iminatory ability worked in a descending 
ries of trials until they could no longer reli- 
bly discriminate between a series of samples 
taining solute and distilled water blanks. 
he detection threshold for the group of Ss 
nployed in the research was taken as the 
-ometric mean of the thresholds of the indi- 
idual Ss. Lockhart et al. (1955) used a vari- 
nt of the constant method in which each trial 
sisted of a triangle test. The threshold for 
1e group of Ss was estimated using the nor- 
al graphic process (Guilford, 1954). 

Table 5 contains detection thresholds for 
arious of the common minerals with the high- 
st thresholds ranked as one and the lowest 
inked last. It can be seen from perusing 
ables 4 and 5 that, with one major excep- 
on involving MgCle in the Whipple study, 
general inverse relationship exists between 
1e mean acceptability ratings and detection 
iresholds for the six minerals. Since MgClo 
ay be hydrated as MgCle X 6H.O, it is 
ossible that the discrepant results for this 
ineral in the Whipple (1907) study could 
e due to a failure to take into account the 
ydration factor when preparing solutions. 
nfortunately, there appears to be no possi- 
lity of checking for such an error. 

The general inverse relation obtaining in 
able 5 between rated acceptability and de- 
ction thresholds for the eight common min- 
als suggests that it would be most profitable 
) investigate these relationships directly on 
1S X S basis even though the measurement 
f detection thresholds involves overcoming 
any procedural difficulties in order to meas- 
re a most elusive phenomenon (Corso, 1963; 
[cBurney & Pfaffmann, 1963). The large 
ifferences that exist between the thresholds 
btained by Cox et al. (1955) and the other 


two studies summarized in Table 5 exemplify 
the problems besetting the determination of 
a single detection threshold figure. It is obvi- 
ous that any measured detection threshold is 
dependent upon the characteristics of the Ss 
studied and the procedures employed to meas- 
ure the threshold. In future studies investigat- 
ing relations between thresholds and rated 
intensity or acceptability for various mineral 
solutes, an approach that does not lose datum 
from individual Ss in grouping the results for 
analysis may serve to overcome some of the 
difficulties besetting such research. 

Shifting the discussion from possible rela- 
tions between detection thresholds and rated 
acceptability to the measurement of consumer 
acceptance of mineralized waters for daily 
drinking, some final comments are in order 
regarding indications of future research with 
the taste scale. The research carried out so 
far has used Ss who have been consuming 
waters having little or no mineral taste. It 
would be most instructive to compare the 
ratings given the various samples employed in 
these studies with ratings given the same 
samples by individuals who were accustomed 
to drinking waters containing TDS in the 
range above 1,200 ppm. It seems reasonable 
to conjecture that Ss accustomed to high min- 
eral waters would give lower ratings to sam- 
ples containing mineral solutes which are pres- 
ent in substantial amounts in their community 
water supply. 

Another point that ought to be mentioned 
is that single minerals have been used as 
solutes in all three studies. Naturally mineral- 
ized waters usually contain varying amounts 
of all the cations and anions which comprise 
the common minerals. Thus, research with 
naturally mineralized waters, or with solutes 
composed of blends and combinations of vari- 
ous common minerals, is indicated to begin 
to determine how rated acceptability varies 
with the solute blend or combination employed. 

Finally, the ratings obtained from the taste 
scale procedure constitute but one type of 
data that may be used to measure the accepta- 
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bility of mineralized waters. Results obtained 
from an interview and from attitude scales 
comprise other measures of acceptability (On- 
gerth et al., 1964). The interrelationships ob- 
taining between the results of these three 
measuring instruments will give some perspec- 
tive upon the adequacy and accuracy of each 
instrument and upon consumer acceptance of 
mineral taste in water. 
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VIGILANCE: 
EFFECTS OF FREQUENCY OF KNOWLEDGE OF RESULTS 


EDGAR M. JOHNSON anp M. CARR PAYNE, JR. 


Georgia Institute of Technology 


For an hr. Ss observed an oscilloscope on which 8 signals appeared per 15 mins. 
which they were to report. Knowledge of results was given after 0%, 25%, 
50%, 75%, or 100% of the signals. Significant differences occurred between the 
number of targets detected by the 0% and 25% groups, the 25% and 50% 
groups, but none between the 50%, 75%, and 100% groups. The vigilance decre- 
ment was not significantly affected by frequency of KR. 


Knowledge of results (KR) produces an 
crease in detection frequency (Adams & 
fumes, 1963; Baker, 1958, 1960; Garvey, 
aylor, & Newlin, 1959; Hardesty, Trumbo, 
Bevan, 1963; Mackworth, 1961, p. 172; 
[cCormack, 1959; Pollack & Knaff, 1958; 
ipowicz, Ware, & Baker, 1962; Weidenfel- 
r, Baker, & Ware, 1962). The present study 
<tends earlier studies by investigating ef- 
cts of frequency of KR on overall perform- 
nce and the decrement in performance as a 
inction of time. 

An oscilloscope (screen 9 X 11 centimeters) 
as adjusted so that a horizontal trace about 
1 centimeter high with no “halo” went 
cross the middle of the screen 10 times per 
\inute. Eight times in a 15-minute period an 
regular vertical deflection of about one 
entimeter occurred in this trace. These de- 
ections could occur anywhere along the hor- 
ontal axis of the cathode-ray tube. The 
ibject (S) depressed a standard telegraph 
ey whenever he detected a deflection. 
‘he Ss worked individually throughout the 
xperiment. 

Fifty male students served as. Ss for a 
-hour test period. The Ss were randomly 
ssigned to experimental groups (10 per 
roup). The Ss were told that as often as 
ossible they would be given KR through 
arphones which they wore throughout the 
xperiment. One group (25% KR) was given 
-R (ie., they were told that a signal had 
ppeared) after two of the eight deflections 
1 each 15-minute period; another (50% 
<R) was given KR after four of the deflec- 
ions; another (75% KR) after six; another 
100% KR) after each of the eight deflec- 
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tions; and another (0% KR) was not given 
KR. The appropriate percentages of the eight 
deflections were randomly chosen within each 
of the 15-minute time periods for each experi- 
mental group. These KR were in addition to 
KR obtained by S when he saw a signal and 
responded to it. Before commencing the ex- 
periment the Ss observed four deflections as 
practice, 

The number of signals missed by each 
group served as raw data. These are sum- 
marized in Figure 1 which shows the mean 
percentage of signals detected by each group. 
A chi-square test showed the data to be 
normally distributed and Cochran’s test 
(1941) showed homogeneity of variance and 
homoscedasticity. 

Analysis of variance (Edwards, 1960, p. 
227-232) showed performance as a function 
of time to be significant. Tukey comparisons 
(Winer, 1962, p. 87) indicated that per- 
formance in each 15-minute period differed 
significantly from that in each other 15- 
minute period. This is the familiar vigilance 
decrement. Interaction between time period 
and KR group was not significant. Thus, 
frequency of KR had no significant effect 
upon the tendency for performance to become 
poorer with time. 

Analysis of variance showed overall per- 
formance to differ significantly as a function 
of frequency of KR (p< .001). By Tukey 
comparisons each group differed significantly 
(p < .01) from each other with the excep- 
tion of the 50% KR group which did not 
differ significantly from the 75% or 100% 
groups. Dunnett’s method (Winer, 1962, p. 
89-92), a more robust test, showed neither 
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Fic. 1. Mean percentage of signals detected as a function of 15-minute 
time periods of watch and percentage knowledge of results (KR). 


the 75% KR group nor the 50% KR group 
to differ significantly from the 100% KR 
group. 

The present study showed detection fre- 
quency over the hour’s period to be a mono- 
tonic, nonlinear function of KR. It is in 
accord with the literature in that perform- 
ance was superior with KR as compared with 
no KR. The vigilance decrement, on the 
other hand, was not significantly affected by 
frequency of KR. If we wish to decrease the 
decrement some means other than KR will 
have to be employed. 
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EFFECTS OF DISCRETE TRANSFORMATIONS OF CON- 


TROLLER OUTPUTS ON HUMAN TRACKING 
PERFORMANCE * 


DARWIN P. HUNT 


University of Dayton 


4 groups of 8 Ss each performed a compensatory tracking task using an ac- 
celeration control system. Each group employed a different controller output 
transformation: 3-, 5-, 7- category, or continuous. Each S used 4 gain (G) 
levels. Both tracking accuracy and economy were measured. The number of 
output categories (C) significantly affected the economy (p< .05) but not 
the accuracy of performance. The G effects were significant for both accuracy 
(p< .001) and economy (p< .001). Accuracy improved and economy de- 
creased monotonically over the lower 3 gains so that there was a trade-off 
between the 2 performance measures; at the highest gain both accuracy and 
economy were degraded. Although inspection of the accuracy data suggests 
that as the number of output categories increases the optimal gain becomes 


higher, the G X C interaction was not significant. 


Several studies have been conducted to de- 
rmine the effects of transformations of in- 
rmational inputs on man’s ability to per- 
rm various psychomotor tasks. The effects 
continuous linear transformations of infor- 
ation on the ability to make blind settings 
1 a micrometer have been reported by Bilo- 
au (1953). The micrometer-setting task 
as employed by Noble and Broussard (1955) 
_ study continuous curvilinear transforma- 
ons. The effects of discrete linear transfor- 
ations of feedback information on the per- 
rmance of a simple tracking task (Schoef- 
ar, 1955) and of a complex compensatory 
acking task (Hunt, 1961) have been stud- 
d; Hunt (1964) also investigated discrete 
ynlinear informational transformations. 

In contrast to the variety of experiments 
assigned to examine the effects of both con- 
nuous and discrete transformations of infor- 
ational inputs, it seems that experimental 
terest in transformations of the man’s out- 


1The data were collected under USAF Contract 
o. AF-33(616)-7863 while the writer was em- 
oyed at the Behavioral Sciences Laboratory, 
right-Patterson Air Force Base. The assistance of 
mes Nehez, University of Dayton, who con- 
ibuted in many ways including the construction 
1d maintenance of the apparatus, of J. P. Hornseth 
id H. Leon Harter, both of Wright-Patterson Air 
orce Base, who contributed to the data analysis, 
id of M. J. Warrick, whose comments concerning 
e preliminary manuscript were most helpful, is 
atefully acknowledged. 
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puts has been confined largely to the study of 
continuous functions. That is, the effects of 
nonlinear and linear control systems under 
various sensitivities have been investigated, 
but little experimental attention has been de- 
voted to the discrete transformation or quan- 
tization of man’s outputs. For example, sev- 
eral studies (Jenkins, 1953; Jenkins & Con- 
nor, 1949; Jenkins & Karr, 1954; Jenkins & 
Olson, 1952) have been concerned with the 
influence of control-display movement ratios 
on the ability to make discrete settings. Such 
ratios, as well as simple control gains, how- 
ever, may be conceptualized as linear continu- 
ous transformations of the man’s controller 
outputs. 

The experiment reported below was con- 
ducted to determine the effects of transform- 
ing controller movements into 3, 5, or 7 ef- 
fective output categories on the accuracy and 
economy with which a compensatory tracking 
task may be performed; an index of tracking 
economy was obtained by integrating the ab- 
solute value of the transformed controller 
output over time. 


MeEtTHOD 


Subjects. Thirty-two right-handed male students 
at the University of Dayton were paid to serve as 
subjects (Ss). 

Apparatus. The apparatus (Figure 1) used in 
this study is the same as that described elsewhere 
(Hunt, 1964) with two exceptions: (a) the 
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“transformer” modified S’s controller outputs in- 
stead of his informational inputs and (0b) the 
target course generator in the present study in- 
troduced step changes rather than continuous 
sinusoidal perturbations. The tracking error was 
displayed as a dot moving horizontally +2 inches 
on the face of an oscilloscope. The zero-error posi- 
tion was indicated by a fixed vertical illuminated 
line centered on the display. The dot moved 
smoothly across the scope face except when step 
changes were introduced by the target-course gen- 
erator. The displacement of the dot from the zero 
position was linearly related to the magnitude of 
error. 

The S’s task was to return the dot to the zero- 
error position each time a step change occurred and 
to hold it there. This was accomplished by the 
appropriate movement of a control lever which 
was located on the floor directly in front of him. 
The 23.5-inches control lever could be moved 20 
degrees to the right or left; when no lateral force 
was applied to the control, slight spring centering 
held the lever in the center upright position at which 
the output was zero. Movements of the control 
to the right or left would move the error dot in an 
accelerative fashion to the right and left, respectively. 

A voltage analogue, linearly related to the angular 
displacement of the control, was transformed ac- 
cording to one of the four functions presented in 
Figure 2. The controller dynamics were obtained 
by the double integration of these transformed con- 
trol voltage analogues. For the 3-, 5-, and 7-category 
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conditions, S$ was operating essentially an acclera- 
tion-controller system in which he could select only 
certain predetermined levels of acceleration; for 
example, in the case of the 5-category transforma-~ 
tion S could select, by the proper displacement of 
the control lever, one of two levels of acceleration in 
either direction. The control lever moved smoothly 
over its entire range so that no tactual cues con- 
cerning the precise location of the transition from 
one accelerative-control category to an adjacent one 
were available. In all cases, a control displacement of 
+ 1.35 degrees resulted in zero-controller output. 

Each of the transformations shown in Figure 2 
was combined with four different control gain 
levels. The 4 gain levels, specified in terms of the 
acceleration obtained for maximum displacement of 
the control lever, were: .095, .244, .946, and 2.44 
inches per second squared. Table 1 shows the accel- 
erations that resulted from various control displace- 
ments under each of the transformation-gain combi- 
nations. 

The target course consisted of sudden changes in 
the position of the dot every 10 seconds. The direc- 
tion (right or left) and magnitude (0.33, 0.67, or 
0.99 inch) of the changes were programed on 
punched tape and presented by means of a Western 
Union Telegraph Company Tape Transmitter. The 
control, the display, and the S were enclosed in a 
small sound-shielded booth which contained a dim 
white light and a ceiling fan. 

Performance measures were obtained by integrating 
the absolute value of (a) the transformed control 
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Fic. 1. Block diagram of the apparatus. 


EFFECTS OF TRANSFORMATIONS 37 


} 20 2.44 
THREE CATEGORY oo 
& 
- = 2 
, O10 ze 
| Lo = 
j wn 
6 
J -E 
ar oS 0 « 
ei 
j WwW 
8 
z LO aq 
Ei Q 
} uil0 3 

— = 
J at 
: o 
5 20 0 
J 
; 20 2.44 
i “~20 10 0 0 20 

LEFT RIGHT 
ACTUAL CONTROL DISPLACEMENT (DEGREES) 


2.44 
SEVEN CATEGORY 20 





20 2.44 
20 10 0 10 20 


20 2.44 
FIVE CATEGORY 20 

10 

1.0 

0 0 

1.0 
10 

2.0 
20 2.44 

20 10 0 10 20 
20 2.44 
CONTINUOUS 20 

10 

1.0 

0 0 

1.0 
10 

2.0 

2.44 
O50 10 0 10 20 


Fic. 2. The four output transformations investigated. 


Itage analogues (JC scores) and (b) the tracking 
“or voltage analogues (e scores), where the track- 
y error voltage analogue is the momentary dif- 
‘ence between the forcing function voltage and the 
uble integration of the transformed control voltage. 
¢ and e scores were obtained for each trial. 

Procedure. The 32 Ss were divided into 4 groups 
8 Ss. The groups corresponded to the 4 controller 
unsformations discussed in the Apparatus section. 
ich S was given 12 trials on each of 4 successive 
ys under his assigned transformation using a dif- 
rent gain level on each day. For each transforma- 
yn, the gain levels were presented in 4 different 
ders—2 Ss for each order—according to a Latin 
uare. Only 4 Ss could be tested during 1 day. 
2 each day, each transformation and each gain 
yel occurred once. Over the entire experiment, each 
unsformation occurred twice at each of the daily 
sting times: 10:00, 11:00, 1:30, and 3:00 o’clock. 
1e initial order of appearance of Ss determined 
eir assignment to particular experimental conditions. 
On the day immediately preceding S’s first 


testing day, he was instructed concerning the task, 
the general operation of the control, and the 
procedural details. Also, he was told that when- 
ever the dot jumped off center, he was “to return 
it to the center line as quickly and accurately as 
you can and hold it there.” The S was then given 
3 trials under each of the 4 gain levels using his 
assigned transformation. The order in which the 
gain levels were administered on the practice day 
corresponded to the order in which the gains were 
presented to each S during the 4 test days. 

Each trial consisted of 12 sudden changes in the 
dot position. Thus each trial was 2 minutes in 
duration. The 6 possible step changes (3 magnitudes 
in either of 2 directions) were programed such 
that each change occurred equally often at each 
ordinal position over each set of 6 trials. Five 
tape programs, each consisting of 12 trials, were 
constructed. The order in which these programs 
were used was the same for each S, which resulted 
overall in each tape program being used equally 
often for each transformation-gain combination. On 
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TABLE 1 


ACCELERATIONS OBTAINED FoR VARIOUS DISPLACEMENTS (IN DEGREES) OF THE CONTROL LEVER UNDER THE FOUR 


OvutTpuT TRANSFORMATIONS AND THE Four GAIN LEVELS INVESTIGATED. 


VALUE IN EACH CELL IS THE 


ACCELERATION, IN INCHES PER SECOND’, OF THE Dot on Supyects DispPLay 














Gain level 

Number of Displacement 
categories of control 095 244 95 2.44 
Three 1.35-20.0 095 244 .946 2.441 
Five 1.35-10.67 039 .095 386 .946 
10.67-20.0 .095 244 .946 2.441 
Seven 1.35- 7.57 025 .063 .246 .630 
7.57-13.79 .060 154 596 1.535 
13.79-20.0 095 244 946 2.441 
Continuous 5 .024 061 236 .610 
10 047 AP 473 1.220 
15 071 183 709 1.831 
20 095 244 .946 2.441 





Note.—For the continuous conditions the displacement values (5, 10, etc.) are merely representative of a virtually infinite 


number of possible values. 


test days, intertrial intervals of 30 seconds were 
introduced between all trials except Trials 6 and 7 
between which a 4-minutes rest was provided. 


RESULTS 


The means of the e and TC scores over the 
terminal 6 trials under each transformation- 
gain combination are presented in Figure 3. 
With regard to the accuracy of performance, 
inspection of Figure 3 suggests the following: 
(a) the number of output categories has rela- 
tively little systematic influence on accuracy; 
(6) increase in control sensitivity results in 
improved performance up to a point, beyond 
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which a further increment in sensitivity is 
associated with a degradation in accuracy; 
(c) the optimal gain level appears to be higher 
as the number of output categories increases, 
for example, the degradation associated with 
the 2.44 inches per second squared as com- 
pared with the .95 inch per second squared. 
gain level decreases as the number of cate- 
gories increases, and (d) the 7-category con- 
dition under the 0.95 gain resulted in the least 
error. 

Concerning the economy of performance as 
indicated by the TC scores, Figure 3 suggests 
that (a) there is an inverse relationship be- 


SEVEN CATEGORIES CONTINUOUS 
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Fic. 3, Accuracy (e scores) and economy (JC scores) of tracking as a function of control gain, that 
is, the maximum acceleration available. (Separate graphs are made for each transformation. Each point is 


the mean of 8 Ss.) 
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TABLE 2 


An ANALYSIS OF VARIANCE OF THE E AND TC Scores 

















e scores TC scores 
Source df MS PB MS PB 
Between subjects 31 
No. of categories (C) 3 78,149 3.03 480,400 4.11* 
Order (O) 3 83,744 3.20 31,638 —~ 
Cex O: 9 DIeSoe — 34,346 — 
Error (b) 16 25,776 116,995 
Within subjects 736 
Gain (G) 3 851,980 20.58** 1,930,764 20.89** 
Trials (T) 5 3,910 Silos 1,875 2.91* 
GXC 9 37,229 — 151,838 1.64 
ax C 15 1,111 = 1,011 1.57 
GxXO 9 60,745 1.47 140,987 1.53 
Gx 15 1,267 1.08 821 1.63 
OxT 15 490 —_— 699 1.08 
cxGxo a 39,419 ss 87,021 3 
(Se eee 45 1,009 —_— 571 Hens: 
Cx OFX 1 45 2 a 756 AA 
GeO XT 45 1,380 1.18 625 1.24 
GxG x Ox I 135 1,582 1352 573 1.14 
Error (w) 
wi 48 41,401 92,445 
We 80 1,249 645 
W3 240 1,173 504 
Total 767 
*p <.05. 
*  <.001. 
een gain level and economy, (0) there is a DISCUSSION 


rect relationship between the number of out- 
it categories and the economy, for example, 
all gain levels the lowest mean TC score 
is obtained under the continuous condition 
id (c) the fewer the number of output cate- 
ries the greater is the influence or gain level, 
at is, the rate at which tracking economy 
creases as the control becomes more sensi- 
ve is inversely related to the number of out- 
it categories. 
The apparent relationships indicated above 
sre evaluated by subjecting the e and TC 
ores on the final 6 trials on each test day 
separate analyses of variance. The vari- 
les included in both analyses were: Gain 
vel (G), number of output categories (C), 
rials (T), order of gain levels (O), and Ss. 
he results of these analyses are shown in 
able 2. 


Control sensitivity had a significant effect 
on both the accuracy (p < .001) and econ- 
omy (p< .001) of performance. The fact 
that the TC scores were monotonically and 
the e scores were non-monotonically related 
to control sensitivity results in a trade-off be- 
tween accuracy and economy only over the 
lower three gain levels (Figure 3). Increases 
in the gain levels over the lower sensitivities 
were associated with improved accuracy but 
lowered economy of performance; at the high- 
est gain level employed, accuracy was im- 
paired and economy was further reduced. 

On the basis of the instructions S could rea- 
sonably infer that his task could be performed 
“best” by minimizing the duration and magni- 
tude of the departures of the dot from the 
center zero-error position. Such a task orienta- 
tion could be expected to result in minimum 
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e scores at the possible expense of increased 
TC scores. Thus, the functions plotted in Fig- 
ure 3 should be considered specific to the task 
situation which places a relatively greater 
emphasis on performance accuracy than on 
economy. 

The number of output categories had no 
statistically significant influence on accuracy, 
but did affect the economy (~ < .05); gen- 
erally, the economy of performance was di- 
rectly related to the number of output cate- 
gories. The effect of the control gain was not 
influenced significantly by the output trans- 
formation employed for either accuracy or 
economy. Thus, the apparent decrease in the 
slope of the curves relating the TC scores to 
gain level, as the number of output categories 
is increased (Figure 3), must be attributed to 
chance. However, it is obvious that at a zero 
gain level all curves would converge to a zero 
TC score or “maximum economy.” 

Finally, statistical support for the observa- 
tion that the 7-category condition results in 
more accurate performance than any other 
transformation failed to materialize, that is, 
neither the C nor the GX C interaction 
for the e scores was significant. However, it 
is observed in Figure 3 that as the number of 
output categories increases, accuracy is gen- 
erally degraded by decreasing amounts at the 
gain level above the optimum and by increas- 
ing amounts at the gain level lower than the 
optimum. This suggests that further consid- 
eration of the possibility of a shift of the opti- 


mal gain level toward greater sensitivity as 
the number of categories is increased may be | 
fruitful. No acceptable rationale has been de- | 
veloped by the writer to account for the sig- 
nificant C X G X O X T interaction. | 
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Responses to both forced choice and free-choice items were obtained from 50 
accountants and 82 engineers in self-descriptions of past satisfying and dis- 
satisfying job situations. Both groups of Ss endorsed more “intrinsic” than 
“extrinsic” items when describing both situations. Achievement, Work Itself, 
and Responsibility were mentioned most often in describing past satisfying 
situations, and lack of Advancement and Recognition were most often men- 
tioned in dissatisfying situations. It was concluded that both intrinsic and 
extrinsic factors can be sources of both satisfaction and dissatisfaction, but 
intrinsic factors are stronger in both cases. Satisfaction variables are not 
unidirectional in their effects, and expectations have a strong influence on the 


extent of satisfaction with job factors. 


In 1959, Herzberg, Mausner, and Snyder- 
nan (1959) published results of a study in 
which the authors concluded that job satis- 
faction and job dissatisfaction are caused by 
yualitatively different job factors. According 
‘0 these authors, people attribute their satis- 
faction to certain aspects of the job, and they 
attribute dissatisfied feelings to aspects dif- 
ferent from those connected with job satis- 
faction, Since that 1959 report, a number of 
studies have been carried out in attempts to 
test the Herzberg et al. theory (see Ewen, 
1963; Peres, 1963; Rosen, 1963; Schwartz, 
Jenusaitis, & Stark, 1963). 

Generally, the results of these followup 
studies have not given unequivocal support 
to the Herzberg et al. theory. For instance, 
in the study by Ewen, dissatisfiers actually 
acted as satisfiers, while satisfiers sometimes 
acted in the predicted manner, and sometimes 
caused both satisfaction and dissatisfaction. 
The “satisfiers” of Herzberg et al. seem to 
have been more often confirmed in these fol- 
lowup studies than the “dissatisfiers.” 

A number of writers have taken issue with 
the methods used by Herzberger et al. as 
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well as with their conclusions. Brayfield 
(1960) discounted their results on the basis 
of the method used (content analysis of in- 
terview data). Kahn (1961) felt that defen- 
sive behaviors and displacement could ac- 
count for their findings, and Vroom and 
Maier (1961) questioned the legitimacy of 
the Herzberg et al. conclusions that quali- 
tatively different conditions act as_satis- 
fiers from those acting as dissatisfiers. They 
stated: 


There is a risk in inferring the actual causes of 
satisfaction and dissatisfaction from descriptions of 
events by individuals. It seems possible that the 
obtained differences between events may reflect de- 
fensive processes at work within the individual. Indi- 
viduals may be more likely to perceive the causes 
of satisfaction within the self and hence describe 
experiences invoking their own achievement, recogni- 
tion, or advancement in their job. On the other 
hand, they may tend to attribute dissatisfaction not 
to personal inadequacies or deficiencies but to factors 
in the work environment, i.e., obstacles presented by 
company policies and supervision [p. 433]. 


PURPOSE 


In view of the many possible deficiencies of 
the Herzberg et al. study, the validity of 
their conclusions is certainly questionable. 
Neither have the followup studies shed much 
light on the precise manner in which the dif- 
ferent factors of job attitudes operate. The 
far-ranging conclusions and theoretical for- 
mulations which have been derived from the 
Herzberg et al. results make it highly desira- 
ble that a firm foundation should have been 
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previously established through well-controlled 
empirical research. 

The purpose of this study was to test the 
Herzberg et al. findings that the five major 
factors related to the doing of the job itself 
(Recognition, Achievement, Work Itself, Ad- 
vancement, and Responsibility) are the pri- 
mary determiners of job satisfaction, and that 
the five major factors related to the job en- 
vironment (Salary, Company Policies and 
Practices, Technical Aspects of Supervision, 
Interpersonal Relations in Supervision, and 
Working Conditions) cause job dissatisfac- 
tion. The job-related factors have been 
designated “intrinsic” factors, while the 
environment-related factors have been named 
“extrinsic” factors. A_ strictly objective 
method was used in carrying out this study, 
and special efforts were made to prevent bias- 
ing influences from affecting the results 
obtained. 

The four hypotheses tested in this research 
were as follows: 

Hypotheses I, The proportion of intrinsic 
item endorsements is greater than the propor- 
tion of extrinsic item endorsements, when 
persons describe a time in which they felt 
exceptionally satisfied on their jobs. 


Hoy: ere > Pre 
elas Pia Pug 


Hypothesis II, The proportion of intrinsic 
item endorsements is less than the proportion 
of extrinsic item endorsements, when persons 
describe a time in which they felt excep- 
tionally dissastisfied on their jobs. 


Hos: Pe < Pxrp 
Hl: Paes Ppp 


Hypotheses III. The proportion of intrinsic 
item endorsements by persons, when describ- 
ing a time in which they felt exceptionally 
satisfied on their jobs, is greater than the 
proportion of intrinsic item endorsements by 
persons, when describing a time in which 
they felt exceptionally dissatisfied on their 
jobs. 

Hos;: Bis = Pis 
Hi;: Pres ene 
3] =JIntrinsic items; E= Extrinsic items; S= 


Satisfying situations; and D = Dissatisfying situa- 
tions. 


Hypothesis IV. The proportion of extrinsic 
item endorsements by persons, when describ- 
ing a time in which they felt exceptionally 
dissatisfied on their jobs, is greater than the 
proportion of extrinsic item endorsements by 
persons, when describing a time in which they 
felt exceptionally satisfied on their jobs. 


Hou: ene << Pr 
Hl,: Pr < Pr 
MeEtTHOD 


A slightly modified form of the forced choice tech- 
nique was used in this study, with job-attitude items 
matched in pairs on the basis of both a Preference 
Index (PI) and a Discrimination Index (in this case, 
a Satisfaction Index or SI). If items in each pair 
were matched on both PI and SI, while one item 
was “motivator-related” and the other was “hygiene- 
related,” then only the “intrinsic” or “extrinsic” 
nature of the items should influence which one the 
respondent would choose for self-description. 

On the basis of the phraseology used by Herzberg 
et al. to describe individual job factors, a large 
number of statements were gathered or written. 
These items were rated on a 9-interval scale for 
“extent of satisfaction indicated by that statement,” 
by a group of 30 male, introductory psychology 
students, all of whom had had some business experi- 
ence. The mean rating for each item was taken as the 
SI of that item. Another similar group of 30 students 
rated these same statements on a 9-interval scale 
according to “how well would you like other people 
to know you held that opinion of your job?” The 
mean rating for each item was taken as the PI of 
that item. Thus, a PI was obtained which reflected 
the extent to which the expression of a given state- 
ment would arouse, on the average, defensive proc- 
esses within individuals. 

It was decided to match an item from each of 
the five intrinsic factors of Herzberg et al. with an 
item from each of the five extrinsic factors, so that 
respondents could choose between a given intrinsic 
factor and each of the extrinsic factors, and vice 
versa. 

It was expected that different individuals, in 
specifying a satisfying or dissatisfying job situation, 
would mention situations which differed in the in- 
tensity of feelings produced. Since the intensity of 
a respondent’s feelings was unknown beforehand, 
and accounting for differing intensities would have 
required more elaborate procedures, it was decided 
to build pairs of items reflecting different degrees of 
satisfaction. Accordingly, the statements were divided 
into five categories on the basis of SIs: highly posi- 
tive, slightly positive, neutral, slightly negative, and 
highly negative. Thus, 5 (level of satisfaction, within 
each of the 10 factors) X 5 (number of intrinsic 
factors) X 5 (number of extrinsic factors), or 125 
pairs of statements were possible and were obtained. 
There was some duplication of individual items in 
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these pairs, but all of the item pairs themselves were 
different. 

Derived sets of item pairs were then built, 5 of 
them being positive sets and 5 negative sets. Within 
each positive set, there were 5 pairs of highly posi- 
tive items, 15 pairs of slightly positive items, and 
5 pairs of neutral items. Within each negative set, 
there were 5 pairs of highly negative items, 15 pairs 
of slightly negative items, and 5 pairs of neutral 
items. 

Two criteria were used in selecting one positive set 
and one negative set of item pairs for inclusion in 
the forced choice instrument: the average PI and SI 
of intrinsic items in a set should be the same as the 
average PI and SI of extrinsic items in that set, and 
the average PI and SI of the items in the positive 
set should be the same distance above 5.00 (the 
midpoint of the rating scale) as the average PI and 
SI of the items in the negative set were below 5.00. 
In order to obtain a measure of reliability, another 
positive and another negative set was included, 
using the same criteria. Additional criteria of simi- 
larity of average scale values and standard devia- 
tions to those of the first two sets were also used. 
Each of the 10 job factors was represented by 10 
statements of varying intensity in the 2 positive sets, 
and each intrinsic factor was matched with each 
extrinsic job factor twice. The same was true within 
the 2 negative sets. 

The 50 positive item pairs were placed in booklet 
form, and instructions were added. The respondent 
was asked to describe briefly some situation that 
had occurred on his job which made him feel very 
happy with his job, and then to check the 1 state- 
ment in each of the 50 pairs of statements in the 
list which best described how he felt in that situa- 
tion. The same procedure was followed for the 50 
negative item pairs, except that past unhappy situa- 
tions were to be described. All 100 item pairs were 


TABLE 1 


RELIABILITY OF ENGINEERS’ RESPONSES 
TO FACTORS WITHIN SITUATIONS 











Satis- Dissat- 
fying isfying 
Factor situation situation 
Intrinsic 
Recognition neo 49 
Achievement 44 .70 
Work Itself 68 nif 
Advancement 83 47 
Responsibility nD .70 
Total Intrinsic -82 .90 
Extrinsic 
Salary 65 «74 
Company Policies .65 46 
Technical Competence aoe ahi 
Interpersonal Relations 61 .86 
Working Conditions .70 44 
Total Extrinsic 82 90 





Note.—WN = 82. 


TABLE 2 


RELIABILITY OF ACCOUNTANTS’ RESPONSES 
TO FACTORS WITHIN SITUATIONS 








Satis- Dissat- 
fying isfying 
Factor situation situation 
Intrinsic 
Recognition oe aad 
Achievement .23 -62 
Work Itself 44 mle 
Advancement 81 Pou 
Responsibility -67 71 
Total Intrinsic 81 79 
Extrinsic 
Salary sand hs 
Company Policies .65 65 
Technical Competence -62 82 
Interpersonal Relations .80 -78 
Working Conditions .60 .40 
Total Extrinsic -81 79 





Note.—N = 50. 


placed in the same booklet, for each respondent. 
For half the item pairs, the intrinsic item was placed 
first, and the extrinsic item second. Half of the sub- 
jects responded to the satisfying situation first, and 
half responded to the dissatisfying situation first. 

In order to compare controlled responses (forced 
choice) with free-choice responses, additional instruc- 
tions followed the list of positive pairs of items. 
The respondent was asked to go back through the 
list, and to “check again the 10 items which best 
describe the satisfying situation on your job.” The 
same instructions were given after the list of nega- 
tive pairs, but with respect to past dissatisfying 
situations. Thus, respondents were free to choose 
all 10 statements pertaining to 1 job factor, or 1 
statement from 10 different job factors, or any 
proportion in between. 

In order to measure present feelings about each 
job factor, five items of varying degrees of satisfac- 
tion were selected for each factor and placed in, 
groups on the rating forms. Instructions were given 
to check the two statements from each group of 
five, which best described how the respondent pres- 
ently felt about that factor. A separate 9-interval 
scale was also included, on which each respondent 
was instructed to indicate how he presently felt 
about his job in general. 

Respondents were 50 accountants and 82 engineers 
from a variety of midwestern companies. These 
companies operated in such industries as industrial 
and military manufacturer, manufacturer of farm 
machinery, flour milling, banking, grocery retailer, 
appliance manufacturer, retail mail order firm, and 
office copy machine manufacturer. The booklets were 
filled out and returned personally to the author at 
the company location in seven of the nine partici- 
pating companies. In two of the companies, the 
booklets were filled out and mailed directly to the 
investigator. : 
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TABLE 3 


PROPORTIONS OF INTRINSIC AND Extrinsic ITEM EN- 
DORSEMENTS BY RESPONDENTS IN FORCED CHOICE 
SITUATIONS 








Situations 





Engineers* Accountants> 





Satis- Dissat- Satis- Dissat- 
fying isfying fying  isfying 





Proportions of intrinsic 





endorsements -63 -61 .64 .60 
Proportions of extrinsic 
endorsements BOI 39 36 40 
aN = 82. 
bN = 50. 
RESULTS 
Reliability 


An estimate of the reliability of the engi- 
neers’ responses can be obtained from the data 
presented in Table 1. The coefficients in 
Table 1 are the self-correlations between job- 
factor scores on the two forms of the ques- 
tionnaire, augmented by the Spearman- 
Brown formula for whole-test reliability as 
computed from the half-tests. The corre- 
sponding data for the sample of accountants 
are shown in Table 2. It can be seen that 
these reliability estimates were sufficiently 
high so that confidence could safely be placed 
in the results of this study, with respect to 
total intrinsic and total extrinsic factor 
relationships, 


Tests of Hypotheses 


Table 3 presents the proportions of forced 
choice responses and Table 4 summarizes the 


tests of hypotheses for both engineers and 
accountants. A formula given by Guilford 
(1956, p. 195) was used in these computa- 
tions. For Hypotheses I and II, the obtained 
proportions were tested against a hypothetical 
chance proportion of .500. 

For both occupational groups, Hypothesis I 
was accepted: both groups endorsed a higher 
proportion of intrinsic items when describing 
past satisfying situations. Hypothesis II was 
rejected for both groups: a larger proportion 
of intrinsic items was also endorsed when 
describing past dissatisfying situations. Hy- 
potheses III and IV were accepted for the 
accountant sample, but rejected for the engi- 
neer sample. However, for both groups, there 
was little difference between the proportion 
of intrinsic items endorsed in past satisfying 
situations and the proportion of intrinsic 
items endorsed in past dissatisfying situa- 
tions. The same was true of extrinsic item 
proportions. 


Comparisons with Herzberg et al. Data 


Herzberg et al. presented their data in the 
form of percentages of time that mention 
was made of job factors in free-response 
interview situations. The present data on 
individual factors do not lend themselves to 
direct comparisons with the corresponding 
data from their study, but rank-order tech- 
niques were available, and it was possible to 
carry out a transformation of their data to 
facilitate the comparison of findings. 

Herzberg et al. reported 228 “high” 


TABLE 4 


SUMMARY OF TESTS OF HYPOTHESES 








Occupational 
Hypotheses group Comparison Z Conclusion® 

I Engineers .628 vs. .500 18.21 Accept 
Accountants .637 vs. .500 14.24 Accept 
II Engineers .610 vs. .500 15.41 Reject 
Accountants .602 vs. .500 10.41 Reject 
I Engineers .628 vs. .610 1.80 Reject 
Accountants .637 vs. .602 2.36 Accept 
ITV Engineers 390 vs. .372 1.80 Reject 
Accountants 363 vs. .398 2.36 Accept 





ap <.01. 
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TABLE 5 
AVERAGE FREQUENCY OF ENDORSEMENT OF JOB F'AcroRS BY THREE SAMPLES 
Accountants Engineers Herzberg et al. 
Factors 
Satisfying | Dissatisfying| Satisfying | Dissatisfying | Satisfying | Dissatisfying 
Achievement 6.98 5.98 Ucilil 5.79 12.09 2.28 
Work Itself 6.58 5.64 6.82 5.62 7.67 4.69 
Responsibility 6.76 6.06 6.76 6.23 6.76 2.01 
Recognition 5.54 6.30 6.27 6.32 9.75 6.03 
Interpersonal Relations 4.68 3.64 5.02 3.65 1.17 4.96 
Advancement 5.96 6.08 4.60 6.41 5.98 3.62 
Company Policies 3.16 4.00 3.76 4.12 91 10.32 
Working Conditions 3.94 3.14 Sei 3.21 .26 3.62 
Salary 3.26 4.52 Sal, 4.10 4.42 5.63 
Technical Competence 3.02 4.54 2.90 4.38 91 6.70 
All factors 49.88 49.90 49.93 49.83 49.92 49.86 




















sequences (satisfying situations) and 248 
“low” sequences (dissatisfying situations) 
upon which their percentages were apparently 
based. Converting their percentages (for each 
job factor) to frequencies and setting them 
to a base of 50 resulted in the values in 
Table 5. The formula used to convert their 
percentages was 


Observed frequency 
X = of mention per job 
factor 
50 
Total number of mentions 
in that situation 


x 


The sum of the “mentions” in the Herz- 
berg et al. “high” sequences was 384, and 


372 in the “low” sequences, and these figures 
were used in the conversion formula. Of 
course, the present data were ipsative while 
their data were not. Table 5 also shows the 
average number of endorsements of each job 
factor by the subjects in this study. 

Table 6 presents the rank-order correla- 
tions among the job factor means of the 
engineers and accountants in this study, and 
the transformed values from the Herzberg 
et al. study. It may be seen that all three 
groups placed very nearly the same relative 
importance on individual factors in describ- 
ing satisfying job situations. The two groups 
in this study also placed about the same 
relative importance on individual job factors 
in describing very dissatisfying situations, 


TABLE 6 
RANK ORDER CORRELATIONS AMONG JOB FACTORS 

















Accountants Engineers Herzberg et al. 
Situations 
Satisfying Dissatisfying Satisfying Dissatisfying Satisfying Dissatisfying 

Accountants 

Satisfying 1.00 58 92 58 82 —.81 

Dissatisfying 1.00 A9 98 78 —.25 
Engineers 

Satisfying 1.00 49 .83 —.56 

Dissatisfying 1.00 ni —.26 
Herzberg et al. 

Satisfying 1.00 —.39 

Dissatisfying 1.00 
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TABLE 7 


PROPORTIONS OF INTRINSIC AND Extrinsic I'tEMS 
DoUBLE-—CHECKED BY RESPONDENTS 














Engineers® Accountants? 
Satis- Dissat- Satis- Dissat- 
fying isfying fying  isfying 
Proportions of intrinsic 
endorsements -76 -58 As 58 
Proportions of extrinsic 
endorsements .24 42 “29 42 
aN = 82. 
bN = 50. 


although slightly different from those of the 
satisfying situations. However, the results 
from the present study were negatively re- 
lated to the Herzberg et al. findings in dis- 
satisfying situations. This difference in re- 
sults has already been indicated by the tests 
on proportions. 


Free-Choice Results 


The respondents had been instructed to 
choose freely any 10 statements from the 
50 pairs, as being most descriptive of a 
satisfying and of a dissatisfying job situation, 
by placing a second check mark in front of 
each of those 10 statements. Table 7 shows 
the proportions of endorsements obtained. 

Comparing these findings with those ob- 
tained in the forced choice part of this study 
and those of the Herzberg et al. study, it 
appears that when measures were taken 
(forced choice) to eliminate biasing influ- 


study and those of Herzberg et al. However, 
when no such measures were taken, as in the 
“double-check” situation here, the results 
much more nearly approximated those of the 
Herzberg et al. study. Given complete free- 
dom to endorse items, these engineers and 
accountants did choose intrinsic items much 
more often to describe satisfying job situa- 
tions. On the other hand, given such free- 
dom, they chose more intrinsic than extrinsic 
items to describe dissatisfying situations, but 
the differences were much smaller. In the free 
endorsement of items, these subjects endorsed 
nearly twice as many extrinsic items in dis- 
satisfying situations as they did in satisfying 
situations, a result which Herzberg et al. 
would have been happy to observe. It appears 
that the techniques used in this study for 
eliminating the suspected sources of bias 
had a substantial effect on the obtained 
results. 


Biographical Relationships 


The means and standard deviations of the 
biographical items on which data were ob- 
tained are shown in Table 8. These bio- 
graphical data were correlated with endorse- 
ment of job factors in both satisfying and 
dissatisfying situations. The obtained rela- 
tionships were small and insignificant, but 
with a tendency for the older, better paid 
individuals, who generally had been with the 
company for a long time, to endorse more 
intrinsic items in describing satisfying situa- 


ences, the results were somewhat different tions, and extrinsic items in describing dis- 
from both the free-choice results in this satisfying situations. 
TABLE 8 


Means AND SDs oF BIOGRAPHICAL DATA OF RESPONDENTS 














Engineers* Accountants? 
M SD M SD 
Age 36.7 8.9 32.8 8.4 
Years of education 15.7 1.6 14.8 2 
Number of children Deal Af 1.8 1.4 
Tenure with company 6.7 5.8 7.3 1 
Monthly income $798.90 $271.00 $688.60 $264.08 
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Present Satisfaction 


Some rather consistent relationships in the 
form of Pearson 7’s, as shown in Tables 9 and 
10, were obtained between endorsement of 
factors in describing past extreme situations, 
and present feelings about those factors. 
Greater present satisfaction with factors was 
related to greater endorsement of those fac- 
tors in satisfying situations, but less frequent 
endorsement of those factors in dissatisfying 
situations. The more dissatisfied with a given 
aspect the person felt at present, the less he 
tended to attribute past satisfactions to that 
factor and the more frequently he endorsed 
that factor in describing a past dissatisfying 
situation. 


TABLE 9 


CORRELATIONS BETWEEN PRESENT SATISFACTION WITH 
A Factor, AND ENGINEERS’ ENDORSEMENT OF 
THAT FACTOR IN EXTREME SITUATIONS 











Satis- Dissat- 
fying isfying 
Factor situation situation 
Intrinsic 
Recognition 38 —.09 
Achievement 35 —.10 
Work Itself .26 —.14 
Advancement 31 —.08 
Responsibility .28 —.28 
Total Intrinsic NO —.15 
Extrinsic 
Salary para —.44 
Company Policies 25 —.20 
Technical Competence .03 —.21 
Interpersonal Relations “a2 —.15 
Working Conditions eat, —.12 
Total Extrinsic —.08 —.32 





Note.—N = 82. 


Overall Satisfaction 


Both occupational groups in this study 
said they were generally quite satisfied with 
their jobs at the present time. When asked 
to rate their overall satisfaction at present, 
on a 9-interval scale (1 = very dissatisfied; 
9 = very satisfied), the engineers gave them- 
selves a mean rating of 6.87, with a standard 
deviation of 1.74. For the accountants, the 
mean was 6.58, and the standard deviation, 
1.80. 

The correlations between self-rated overall 
present satisfaction and present feelings about 


TABLE 10 


CORRELATIONS BETWEEN PRESENT SATISFACTION WITH 
A Factor, AND ACCOUNTANTS’ ENDORSEMENT OF 
THAT FACTOR IN EXTREME SITUATIONS 











Satis- Dissat- 
fying isfying 
Factor situation situation 
Intrinsic 
Recognition 49 —.47 
Achievement .29 —.10 
Work Itself -O1 —.06 
Advancement nL —.22 
Responsibility .25 —.04 
Total Intrinsic “OZ —.11 
Extrinsic 
Salary 44 —.47 
Company Policies .39 —.19 
Technical Competence Roe) —.42 
Interpersonal Relations 31 —.02 
Working Conditions Aid -03 
Total Extrinsic .09 —.09 





Note.—N = 50. 


various job factors are shown in Table 11. 
These correlations may be contrasted with 
the findings and conclusions of Herzberg et 
al., who felt that extrinsic factors could not 
contribute much to an individual’s positive 
feeling about his job. 


DISCUSSION 


The results of this study were somewhat 
different from those of Herzberg et al. Table 
3 shows that the subjects in this sample 


TABLE i1 


CORRELATIONS BETWEEN PRESENT OVERALL SATIS- 
FACTION AND PRESENT SCORES ON JOB Facrors 














Engi- Account- 
Factor neers ants 

Recognition 44 36 
Achievement 56 08 
Work Itself 63 13 
Advancement A7 .58 
Responsibility BO 36 
Total Intrinsic she 56 
Salary 19 —.16 
Company Policies 38 22 
Technical Competence 53 38 
Interpersonal Relations 2 49 
Working Conditions .08 05 
Total Extrinsic 58 AO 
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endorsed intrinsic items about equally often 
when describing both satisfying and dissatis- 
fying job situations. They also endorsed in- 
trinsic items somewhat more frequently than 
they endorsed extrinsic items, in both kinds 
of situations. However, statements relating to 
extrinsic factors were endorsed nearly 40% 
of the time in both situations. Apparently, 
either extrinsic or intrinsic factors can cause 
both satisfied and dissatisfied feelings about 
the job. This finding might have been attrib- 
utable to a general biasing element affecting 
individuals’ responses on the whole rating 
form, but the fact that the total intrinsic 
scores in satisfying situations correlated .01 
with total intrinsic scores in dissatisfying 
situations for the engineers, and —.17 for 
accountants, negates this possibility. The cor- 
relations between individual job factors also 
fluctuated near zero. These subjects responded 
to one situation quite independently of their 
responses to the other situation. This par- 
ticular finding is also contributory evidence 
that the techniques used in this design for 
eliminating biasing factors were successful. 
Thus, the results of this study agree with 
the Herzberg et al. findings that intrinsic 
factors are important determiners of satisfied 
feelings about the job, but these results con- 
flict markedly with their claim that extrinsic 
factors contribute most to dissatisfied feelings 
about the job. 

As may be seen from Tables 5 and 6, 
Achievement, Work Itself, and Responsibility 
were most often endorsed in satisfying situa- 
tions by the subjects in this study. The rank 
order of individual factors was similar to that 
of Herzberg et al., although in their sample, 
Recognition was mentioned more often, and 
Interpersonal Relationships less often, than 
in the present study. 

The factor of Interpersonal Relationships 
with Superiors ranked fifth and sixth as con- 
tributors to satisfied feelings by the engineers 
and accountants, respectively, in this study, 
but was an infrequently mentioned seventh 
in the Herzberg et al. study. Of the five 
extrinsic job factors included in this study, 
good relationships with one’s boss would seem 
to be the strongest contributor to job satis- 
faction. In fact, the engineers in this study 
mentioned this factor more often than 


they did the factor of Advancement when 
describing very satisfying job situations. 

In describing dissatisfying situations, the 
subjects in this study endorsed the factors 
of Advancement and Recognition most often, 
followed closely by Responsibility, Achieve- 
ment, and Work Itself. The Herzberg et 
al. subjects, as shown in Tables 5 and 6, 
responded differently. In dissatisfying situa- 
tions, the rank order of factors for their 
sample was negatively correlated with the 
other sets of factor means. However, as 
shown by the correlation of .98, the rank 
order of individual factors in dissatisfying 
situations was virtually the same for the two 
groups in this study. The intrinsic factors 
were mentioned most often in both situations, 
and except for Interpersonal Relations, the 
rank order of job factors was completely 
dichotomized by the intrinsic-extrinsic dimen- 
sion. It is evident from these results that 
the persons in this study responded much 
differently from what the Herzberg et al. 
theory would have predicted. The five 
strongest dissatisfiers, for both accountants 
and engineers, were all intrinsic factors. 

Table 11 shows that the correlations be- 
tween present overall satisfaction and present 
satisfaction with individual factors were posi- 
tive and quite substantial. On the basis of 
the claims made by Herzberg et al., we 
should have expected much lower correlations 
with each of the extrinsic factors than with 
any one of the intrinsic factors, but the evi- 
dence shows this to be true for only two 
factors, Salary and Working Conditions. 
These two factors apparently were not so 
important to the satisfaction on the job of 
these high-level people. This finding does not 
mean that either of those factors could not 
be important—probably for these persons, 
expectations about both money and physical 
conditions were being quite well met. But 
they were just that—expectations—and these 
people, feeling that they were receiving no more 
than they earned or deserved in the way of 
pay and physical working conditions, were not 
affected in a positive manner by them. In 
this way, the factors of Salary and Working 
Conditions may actually operate in a manner 
similar to the Herzberg et al. theory, but 
for quite a different reason. It is not that 
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‘hygiene” factors must be satisfied before 
the ‘‘self-actualizing” factors can operate to 
mcrease job satisfaction. Expectations seem 
to be the important variable here. Individuals 
may have felt that the job aspects of good 
Salary and good Working Conditions, and 
sossibly some other such factors not included 
in this study, were the promises and rewards 
that management had agreed to trade for 
their services. Thus, these individuals felt 
they had a right to expect desirable pay 
and working conditions, and the actual exist- 
ance of such desirable levels failed to show up 
4S an increase in job satisfaction. 


Jos SATISFACTION AND EXPECTATIONS 


From these results, it appears that expecta- 
tions with respect to what the ‘work con- 
tract” consists of are very important to the 
satisfaction or dissatisfaction of persons on 
their jobs. The individual approaches his job 
with culturally influenced views as to what 
the company and what management should 
be expected to contribute in return for his 
services, for his efforts, and for his costs. 
These returns from the company are the ex- 
trinsic factors. He also has a very definite 
view of what he is expected to contribute and 
of what he expects to obtain in return. What 
the company as a whole will obtain for his 
personal “work contract” with it is a rather 
vague picture in his mind, but is tied in with 
and dependent upon his being able to attain 
the things he wants and expects—his personal 
and individual goals. These goals are the in- 
trinsic factors. As with the accountants in 
this study, he is upwardly mobile and “get- 
ting ahead” is a strong possible source of 
satisfaction to him. 

This interpretation can account for the 
Herzberg et al. findings, as well as the results 
of this study. People approach their jobs with 
two different types of expectations. First, they 
desire and expect to have responsibilities, 
achievements, and an interest in their work. 
People generally like to be recognized for their 
efforts and to be praised not only in their 
work, but most other times and places as well. 
Many people also desire and expect to ad- 
vance formally in the company for which they 
work. The attainment of these aspirations and 
expectations produces feelings of satisfaction. 


The lack of attainment, or the frustration of 
these objectives, causes people to be dissatis- 
fied with their jobs. These factors act as re- 
wards or punishments to the person’s self- 
concept, and as such, are strong sources of 
satisfaction and dissatisfaction. Table 5 shows 
that these intrinsic factors were most often 
mentioned in forced choice situations as causes 
of extreme feelings about the job. However, 
as shown in Table 7, persons placed more 
importance on extrinsic factors for dissatis- 
factions, when they were given the opportu- 
nity to do so. 

Second, people bring to their jobs certain 
expectations regarding the amount of their 
salary, the quality of the working conditions, 
the fairness of company policies and practices, 
and of the kind of person their supervisor 
should be. These are the extrinsic factors in- 
cluded in this study. Those are the factors 
that the company provides the individual, 
and he expects them to be of a certain level 
and quality. If these aspects measure up to 
his expectations, they are no longer a matter 
of concern to him. Particularly with respect to 
the factors of Salary and Working Condi- 
tions, the individual feels that the company 
is merely keeping its part of the bargain, and 
very high levels of these two factors alone 
produce little in the way of satisfaction with 
the job. Table 5 shows that these two factors 
were infrequently mentioned as sources of 
very satisfying experiences on the job, and 
Table 11 shows that these two factors bore 
only a small relationship to general job satis- 
faction. Good relationships with the boss have 
a somewhat greater impact on positive job 
satisfaction and would seem to be more of an 
intrinsic factor, except that dissatisfaction 
with the competence of the boss may cause 
personal relationships between his subordi- 
nates and himself to deteriorate. The correla- 
tions between dissatisfaction with the Techni- 
cal Competence of the boss and Interpersonal 
Relationships with the boss were .49 for ac- 
countants, and .61 for engineers. If however, 
the levels of Salary and Working Conditions 
fall below that which the individual expects, 
dissatisfaction follows, and the individual does 
not hesitate to say so when asked. The indi- 
vidual feels that the company is reneging on 
a deal which it made with him, and he be- 
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comes dissatisfied, especially as Herzberg et al. 
showed, with the fairness or the unfairness of 
the company and management. Of course, 
their results were obtained without controlling 
for defensive responses. 

This second type of expectation is different 
in nature from those associated with self- 
fulfillment. In this respect, a person may well 
be described as an economic man. He feels 
he has made a bargain with the company: 
his time, work, efforts, and energy in ex- 
change for a certain amount of money or 
other external rewards. Probably the specific 
amount of money expected is determined by 
a number of variables, foremost of which is 
the desire to support himself and his family. 
He also considers what others like himself are 
receiving, what his own costs (physical, mate- 
rial, and psychological) are likely to be, and, 
to some extent, what his long-range goals 
(aspirations) are. 

In summary, satisfaction with the job can 
be due to high levels of satisfaction with in- 
trinsic factors, and dissatisfaction can be due 
to low levels of satisfaction with intrinsic fac- 
tors. Extrinsic factors cause both satisfaction 
and dissatisfaction less readily than do the 
intrinsic factors, but individuals are more 
likely to say they have bad or dissatisfied 
feelings about these extrinsic factors. Meas- 
ures of satisfaction with Salary and Working 
Conditions may show these two factors to be 


dissatisfiers, as Herzberg et al. claim, but fo 
very different reasons than those invoked i 
their interpretations. Two different sets o 
expectations were seen to be major determi 
nants of how job-attitude factors affect over 
all job satisfaction. 
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This study contrasted the interests of 103 bankers, studied in 1934, with the 
103 bankers who today hold the identical jobs held by the early group, that is, 
each man in the 1934 study has been paired with the individual who in 1964 
held the identical job in the identical bank. The results show a substantial 
consistency in measured interests between the 2 groups. Data are also presented 
for a test-retest group (N =48) tested first at age 38 and again at age 68. 
Striking consistency in interests was noted here also, 


The Strong Vocational Interest Blank 
SVIB) is one of the most widely used psy- 
iological measuring instruments. Its validity 
as been established in a variety of settings 
see Berdie, 1960, for a review) and data col- 
cted over long periods of time have demon- 
rated the stability of measured interests over 
itervals as long as 22 years (Strong, 1955). 
he SVIB accomplishes this by providing an 
idex of the similarity between an individual’s 
kes and dislikes and those of successful men 
1 a wide range of occupations. The results 
re particularly useful in guidance situations 
here counselors are trying to help young 
eople plan their future. 

The use of the SVIB, or any other em- 
irically developed instrument, requires two 
ssumptions of stability. The first is that the 
dividual remains stable over time; the 
cond is that the characteristics of the cri- 
arion groups remain constant. Specifically for 
ne SVIB, the first assumption requires that 
ae individual show some consistency over 
ime in the activities that he finds interesting, 
hus allowing him to plan his future career on 
he basis of current likes and dislikes; the 
econd requires that successful men in a spe- 
ific occupation, say bankers, have the same 
aterests today as did the bankers who were 
tudied in 1934 to establish the SVIB Bankers 
cale. 

This study tests this second assumption. 

The validity of the first assumption, that of 
onsistency within the individual over time, 
as been well-established. 

Several studies have shown that intraindi- 
ridual consistency is high though it does vary 
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with both the age of the subject (Ss) at first 
testing and the interval between testings 
(Hoyt, 1960; King, 1957; Powers, 1956; 
Stordahl, 1954; Strong, 1955). For example, 
the 1959 SVIB Manual reports that “the me- 
dian scale test-retest correlation for eleventh 
grade boys over two and a half years is .81; 
for college freshmen over one year .88, over 
nineteen years, .72; for college seniors over 
five years is .84, and over twenty-two years 
CONE 

Thus the individual is consistent enough in 
his measured interests over several years, par- 
ticularly if he is tested during his college days, 
to make this information useful in laying 
plans for a career several years in the future. 

The second assumption mentioned above, 
that of constancy within a single occupation 
over several years, has received less attention. 
And it perhaps deserves less. Certainly any 
inventory successfully developed to distin- 
guish between occupations is going to be use- 
ful for many years after its development be- 
cause the membership of most occupations 
remains stable. Any inventory that distin- 
guishes between a specific occupation and 
men-in-general in 1934 will certainly separate 
the two groups in 1935. The question under 
study here is whether it will do so 30 years 
later in 1964. This point is becoming more 
crucial as it has been over 3 decades since 
the original SVIB standardization data were 
collected and there is some concern as to 
whether the scales are still relevant. 

Some data are available from earlier studies 
on this issue of occupational constancy. 
Strong and Tucker (1952), in their study. of 
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medical interests, reported mild differences 
between the original criterion group of physi- 
cians, tested around 1930, and a group of 
medical school seniors tested in 1951. Kriedt 
(1949) reported a comparison of the interests 
of a group of psychologists collected in 1948 
with the original criterion group collected 
from 1927 to 1935. His results indicated that 
the psychologists of 1948 had more interests 
in working with people and less in working 
in a laboratory setting than did the group of 
psychologists collected 15 years earlier. 

McCornack (1954), in revising the Social 
Worker Scale, found small but definite dif- 
ferences between his group of social workers 
and the groups studied some years earlier by 
Strong. Kriedt, Stone, and Paterson (1952) 
reported a comparison of Personnel Directors, 
tested in the late 1940s, with the original cri- 
terion group, tested from about 1925-1935, 
used to establish the 1938 key. Their conclu- 
sion was that the 1938 key was still appro- 
priate as their sample demonstrated the same 
distribution as the earlier group. 

Three of the four studies reviewed above 
identified some mild but detectable differences 
between the original criterion group and 
groups from the same occupation collected 
several years later. It is not clear whether 
the differences were due to changes of inter- 
est within the occupation, or whether they 
could instead be explained by sampling dif- 
ferences. 

What is needed is some method of holding 
the sampling technique constant over a sub- 
stantial period of time. One possible pro- 
cedure would be to study a group of indi- 


TABLE 1 


DISPOSITION OF ORIGINAL SAMPLE 











Percent- 
N age 
Couldn’t be located 20 10.6 
Deceased - 91 48.1 
Seriously ill 6 3.2 
Refused to participate 18 9.5 
Filled out inventory incorrectly 6 oe 
Usable respondents 48 25.4 
Total 189 100.0 


viduals who today hold the identical posi 
tions held by the men in an occupationa 
group studied years ago. This procedure ha 
been used in this study with the group o 
bankers who were used to establish the SVIE 
Bankers Scale in 1934. 

Essentially, this is a study of the interest 
of bankers who today hold the identical job 
held by the men in the banker criterion grouj 
30 years ago. 


METHOD 


The group originally used to establish the Banker 
Scale, collected in 1934, included 250 individuals. I 
the 1959 SVIB Manual, Strong described the grow 
as follows: 


172 were members of the Minneapolis Federal Re 
serve System; of these 172, 95 were bankers fron 
state banks in Minnesota which opened immedi 
ately after the 1933 Bank Holiday and 77 wer 
bankers from national banks and designated a 
“sood bankers” by a qualified expert. The remain 
ing blanks were obtained through The Psychologi 
cal Corporation, New York, and from miscellane 
ous sources. Average age 45.5; Education 12. 
grade. 


Using a 1935 bank directory, it was possible t 
identify the position held in 1934 by 189 individual 
from the original criterion group. First, an attemp 
was made to locate and retest these men to deter 
mine the amount of change in interests within th 
individual over 30 years. Second, the individuals wh 
held these 189 jobs in 1964 were approached an 
asked to fill in the SVIB. 


Retesting the Original Group 


The first phase of this study involved the followw 
study of the original criterion group. Table 1 report 
the results of attempts to locate and retest these in 
dividuals. 

As Table 1 indicates, most of the group—90%— 
were located or accounted for, but death and illnes 
greatly reduced the potential sample size. The 4 
useable respondents represented only about one 
fourth of the original group, though they consti 
tuted approximately two-thirds of the current sur 
vivors. 

Was this group of respondents a representativ 
sample from the earlier group of 189? Certainl 
they were younger; one would expect a bias in agt 
simply because the older ones would die soone 
These respondents averaged 38 years old in 193¢ 
compared with the average age of 46 of the entir 
criterion group. In 1964, their average age was 6! 
with a range from 54 to 81 and a median of 71. 

However, this age difference should not affect th 
use of this sample to represent the interests of th 
total group as prior research has shown that inter 
ests are stable after about age 25. Further, a com 
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arison of the average 1934 profile of this subgroup 
vith the 1934 profile of the entire original criterion 
roup showed virtually no differences between the 
neasured interests of the two groups. 


Pesting the Current Bankers 


The major part of this study involved the com- 
arison of the SVIB profiles of the original 1934 
ample of bankers with the matched profiles of the 
nen who in 1964 held the identical jobs. 

This job matching was accomplished by using 
‘ommercial West Bank Directories. The 1935 direc- 
ory provided the job titles of the men in the origi- 
al sample, and the 1964 directory provided the 
ames of the men holding those positions currently. 

This latter group was contacted and asked to fill 
n the SVIB. For example, if the President of the 
lirst State Bank of Duluth, Minnesota, was one of 
he original sample, the current president of that 
yank was asked to fill in the SVIB. 

There were some problems with this matching. 
Jecasionally the bank had changed names or or- 
anizational affiliation, and sometimes the job had 
een changed considerably. When in doubt, we erred 
n the direction of including the current individual 
n the sample even though it was not certain that 
i@ was working in precisely the same job as the 
atlier individual. But this was true in only a small 
yercentage of the cases. 

The results of the attempt to collect matchmates 
or each of the 1934 participants are summarized in 
fable 2. 

The total possible sample was 189. Disregarding 
hose cases where the bank was no longer in busi- 
less, where the job was today filled by a woman, 
nd where the 1934 individual was still in the same 
ob, the sample shrank to 141. Of these, 103 (or 
2%) completed the Strong Blank. 


RESULTS 


The results of the retesting of the 48 sub- 
ects over an interval of 30 years are listed 
n Table 3. 

These figures show the same type of con- 
sistency that we have come to expect in in- 
erest measurement, though the median test- 
‘etest correlation of the occupational scales 
»f .56 is lower than the .75 reported for col- 
ege seniors over 22 years and the .72 for col- 
ege freshmen over 19 years reported by 
Strong in his followup study (Strong, 1955). 
Though the characteristic interests of the 
xroup remained very stable, the individuals 
moved around somewhat in their rank orders. 
[t is likely that these correlations were low- 
ered slightly by the relative homogeneity of 
this group. Strong’s data (1955, p. 68) indi- 
cate that the average retest standard devia- 


TABLE 2 


DISPOSITION OF MatcHep Mates oF BANKERS IN 
ORIGINAL GROUP 











Percent- 

N age 

Bank had gone out of business 26 14 
Woman held job in 1964 9 5 
1934 individual still in same job in 1964 13 7 
1964 job-holder refused to participate 38 20 
Usable respondents 103 54 
Total 189 100 





tion for 14 scales over 18 years was 11.7. For 
the retested bankers, the average standard 
deviation on the same 14 scales was 9.8. 

The results of the comparison between the 
1934 bankers and their matched mates in 
1964 are listed in Table 4. 

While the profiles were essentially similar, 
there were some differences, some of them as 
large as one standard deviation. No apparent 
pattern appeared in these differences; some 
of them were on scales where the 1964 bank- 
ers might be expected to score higher because 
of the generally increasing level of education 
—scales such as Psychologist, Physician, and 
Senior CPA. But other scales usually associ- 
ated with technical, nonprofessional interests, 
such as Industrial Arts Teacher and Voca- 
tional Agriculture Teacher, also showed dif- 
ferences in favor of the 1964 bankers. 

Tn general, because the 1964 bankers scored 
lower on the Banker Scale than did the 1934 
bankers (46 to 52, respectively) and slightly 
higher on most of the other 44 scales (on the 
average, 27 to 24), it appears that the Banker 
Scale is slightly less effective in differentiating 
bankers from men in other occupations than 
it was in 1934. However, some of this regres- 
sion can be explained by cross-validation 
shrinkage as this comparison is essentially 
comparing the validation group with a cross- 
validation group collected 30 years later. 

Although some shrinkage did occur, the 
most noteworthy result shows the SVIB 
Banker Scale is still valid 30 years later— 
current bankers score higher on it than on 
any other scale. 


IMPLICATIONS 


The results of this study indicated there 
was substantial similarity between the likes 
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Table 3 
Test- Retest Results Over 30 Years from 48 Bankers 








































































































Test Retest 

Scales M SDM SD re Se :lO0W15520 25930 35 40 45; SORQORG= 

Artist 16 GCS wOOmE Se 

Psychologist te QOWOo os “er 

Architect an lO@RkS SiGaeG| 

Physician (Rev) (OME Salo amIOSEIGS 

Osteopath 20 eee2o tte OE 

Dentist ° 2\ O1Otc Cae: me eO 

Veterinarian I'S) Sle 2ose WOls) Sie} ee 

Mathematician AMS oR oe Osis 

Physicist is) (2 IMs WAN “7 

Engineer Ce Siee ee LOD ma© 

Chemist Foes aC MRIOZEAG 

Production Manager 35 86 Talis [. al) ea 

Farmer S6n nl Ofs4O CMICO 

Aviator Zee NS ANS SCI 6 

Carpenter ee mlSiZack 109 72 

Printer AS ike 24> 102 Be 

Math Phys Sc Tchr BS N24 W229 Gell 

Ind Arts Teacher (ON IS Sel Onl Saas S aad 

Voc Agric Teacher IS) Wass 22 (lee Se Ssahno. 

Policeman 2 Bes Goma s 

Forest Service Man SUBIZ eisai il ade 

YMCA Physical Dir Ome RCw SESS a 4 

Personnel Director ON SSC melo Om! Nal 

Public Administrator 2S mS Lama OES Oo ste 

YMCA Secretary All Wea Se) ey SxS} ode 

Soc Sc HS Teacher ef 10525 124 40 >. 

City School Supt ZIM NOSE ZO NOG | ee 

Ming eee [2135 13 lor a7 
inister ; ; i 

Musician (Performer) 16 lO07 18 104 74 =a | 
PA a COs (50 bal O45 Oe ee a ee 

Senior C.PA. 3| 105 3l l26RS6 

Accountant St SUI 8 Sa CEN 5S Noss 

Office Man 4\ S.255 Se Omno)| a, 

Purchasing Agent 40 92 37* 94 64 dat 

Banker S| Sa 88 70 aLabe 

Mortician oe SZ onenC moO. — 

Pharmacist COE So OS AGO Came tot 

Sales Manager Some sl Boe Oomi6 2 3 

Real Estate Salesman 4QO 94 40 Ome 

Life Insurance Oe SOLS Sao mi 

Advertising Man Zon MSEN 26 Sao 

Lawyer COMPS Stoza er Como O 

Author — Journalist 26 iOEeS 87 40 es 

President-Mfg_Conc SONOS 158 2FOSMS 9 ; L FES 
ecializ. Level | fl 94 44 Le | 

Interest_ Maturity 50. "GO 40 cage peewee ee 

Occupational Level 587258 83 47 Ce 

Masculin - Feminin 48 S45 ae OnEsS | ese 





* Mean Difference Significant at .O5 Level 


+ Mean Difference Significant at .O! Level 


and dislikes of the bankers in the original 
SVIB criterion group and a group of bankers 
in identical jobs 30 years later. This finding 
clearly upheld the continued use of the SVIB 
Banker Scale for counseling purposes. 

Can these results be generalized to other 
occupational scales of the SVIB? The answer 





= Test Profile 


bene = Retest Profile 


to that question depends on whether th 
banking profession is more or less stable ove 
time than other occupational groups, and thi 
can only be conjecture. It would certainh 
seem that the changes in the banking busi 
ness over the last 30 years have been a 
substantial as those in almost any other occu 
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Table 4 
Comparison of 1934 Bankers vs 1964 Bankers Holding Identical Jobs 
(103 in each group) 
1964 1934 
Scales IM oD Vier. ae San OM O MPC OC OmOOmOD 45 50 55 60 
Artist 15 89 14 . 
Psychologist (San OC ee GS! 
Architect 18 100 14* 109 
Physician (Rev) I9 tt4 8+ 100 
Osteopath 25 = 104 18+ ! 
Dentist el 10319 106 
Veterinarian 23H Ia te 10.8 
Mathematician 18 LIS +2RlOG 
Physicist 2A MAS te 1018 
Engineer 2om aI24522re eS 
Chemist EXO) WIS) (| he NOS 
Production Manager 6 8.7 ; 
Farmer 34 Ol Cea 
Aviator OMe om cam 
Carpenter Zeck MS 8I2O0" + 
Printer 30) 109 2438 
Math Phys Sc Teacher 32 109 24+ | 
Ind Arts Teacher ia SO Care| 
Voc Agric Teacher ee Messy 
Policeman 30 98 28 
Forest Service Man 2411.4 20* 
YMCA Physical Dir Saat 
Personnel Director 28 «129 24* 
Public Administrator 36 98 29+ 
YMCA Secretary 24 =««I.8 23 
Soc Sc HS Teacher 3| 110 28% 
City School. Supt Ee  \@S Bez : 
Social. Worker 23 11.9 164+ : 
Mini (Oman 2 Oxls 
ician (Performer Come Oiaallo 
EA. Som OSEO 
Senior C.PA. 4| 10.3 31 + 
Accountant 4 10.8 38 * 
Office Man 43 9.2 42 
Purchasing Agent 39 9.9 40 
Banker 46 9.3 52+ 
Mortician 36 Oi 33" 
Pharmacist c9 9.3 24+ 
Sales Manager S432 36 
Real Estate Salesman 38 8.2 41 * 
_ife Insurance Oe 9.7 34 
Advertising Man COMMESIOEZS 
Lawyer Come OS) 
Author-Journalist Co (imeco 
President- Mfq_ Conc 34 88 36 
pecialization Level 4 9.9 32 * 
erest Maturit 52 6.6 52 
56 6.( 58 * 


Occupational Level 
Mo e 


S 
CO 





* Mean Difference Significant at .05 Level 


+ Mean Difference Significant at.O | Level 


ation. The original criterion group was col- 
ected in 1934, just after the banks reopened 
ifter the national bank holiday; in many 
jays, a new era began. Accounting procedures 
lave evolved from handwritten records to the 














— = |964 Bankers 
=-----—= |934 Bankers 


most elaborate electronic computer systems. 
To the naive observer, it appears that the 
banking business has attempted to change its 
public image from that of a somber, staid 
old patriarch to that of an aggressive, hard- 
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driving public servant. This is particularly 
reflected in the architecture of bank build- 
ings; frequently the bank is the most modern 
building in the small midwestern town, the 
setting that most of these bankers were 
drawn from. 

In spite of these many changes, consider- 
able stability was found in the characteristic 
interests of bankers over 30 years. If the 
above speculations have any veracity at all, 
if the banking profession is changing at least 
as fast as other occupations, then it seems 
safe to generalize to the other SVIB scales 
and assume that they also remain appropriate 
for use today. It would, of course, be com- 
forting to have comparable followup data on 
other scales to buttress this generalization. 

The test-retest results from the 48 indi- 
viduals over 30 years deserve brief comment. 
Strong has earlier said that interests remain 
very stable from age 25 until about age 55, 
the upper limit of the age of men in his 
followup studies (Strong, 1955). Based on the 
sample in this study, that is, men tested once 
at about age 40 and again at age 70, it seems 
safe to conclude that measured interests 
remain stable well into old age. 
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COMPLEX MANUAL PERFORMANCE 
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12 United States Army enlisted men were tested on 3 manual tasks, knot-tying 
(KT), block-stringing (BS), and block-packing (BP), under 4 conditions: 
(a) Control—Mean Weighted Skin Temperature (MWST) 90.0° F., Hand 
Skin Temperature (HST) 93.0° F. (b) Cold Body—MWST 69.0° F., HST 
90.4° F. (c) Cold Hand—MWST 85.8° F., HST 45.7° F. and (d) Cold Hand- 
Body—MWST 68.5° F., HST 45.8° F. The 3 cooling conditions had a differ- 
ential effect across the 3 tasks. Cold Body was the only condition that did 
not result in significant decrements for all tasks. Knot-tying was unaffected 
by body cooling. The results were interpreted in terms of the differential 
effect of cooling the hand or body upon various aspects of complex manual 


performance. 


Gaydos (1958), in investigating the effect 
on complex manual performance of lowering 
body surface temperature (BST) and either 
lowering hand skin temperature (HST) or 
maintaining normal HST, found significant 
manual performance decrements when HST 
was lowered. He concluded that during cold 
exposure HST, but not BST, is a vital factor 
in maintaining maximum efficiency in the per- 
formance of a complex manual task. Gaydos 
and other investigators (Gaydos & Dusek, 
1958; Clark, 1961) have found that manual 
performance is impaired during cold exposure 
when HST drops to 55 degrees F. and below. 

The purpose of this study was to compare 
the roles of HST and BST in complex manual 
performance using a lower BST than that 
tested by Gaydos. It was expected that lower- 
ing either BST or HST would affect manual 
performance, but that each of these two cool- 
ing conditions would affect different aspects of 
manual performance. 


METHOD 


Subjects. Twelve Army enlisted men were divided 
into four groups of three subjects (Ss) each with 
each group receiving all four treatment conditions in 
a different sequence. One S was unable to complete 
the study. 

Apparatus and procedure. All tasks were per- 
formed inside a 3 feet 6 inches X3 feet X2 feet 
8 inches thermostatically controlled hand box with 
a temperature range of —30 degrees F. to 140 de- 
grees F. The Ss, seated in front of the box, reached 
the tasks through two arm holes and viewed the 
tasks through a window. The hand box was located 
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inside a temperature controlled chamber with a 
range of O degrees F. to 160 degrees F. 

Three manual performance tasks were used: knot- 
tying (KT), block-stringing (BS), and block-pack- 
ing (BP). For the KT task S was required to tie 
one “overhand knot and bight” on each of a num- 
ber of strings hanging from the edge of a horizontal, 
rotatable disk. The BS task involved stringing 
small blocks with holes through each face onto a 
needle and string. The BP task consisted of packing 
small blocks in rows along the floor of a box with 
one hand while holding the box in the other. 

All Ss received 10 training trials per day on each 
of the 3 tasks for 5-consecutive days. A trial was 
defined as stringing 20 blocks, tying 15 knots, or 
packing 30 blocks. Each day during training the 
order of task presentation was determined randomly 
for each S. After training, each S was tested on 
4-consecutive days, that is, a control day and 3- 
experimental days. The order of task presentation 
was counterbalanced daily across the three Ss in 
each sequence. The performance measure was the 
mean number of units completed over four 30-second 
trials. A 30-second intertrial interval was used 
throughout both phases. 

On all test days Ss wore only shorts, socks, and 
boots. The following test conditions were used: 


Control—Mean Weighted Skin Temperature 
(MWST) of 90.0 degrees F. and HST of 93.0 de- 
grees F. 

Cold Body—MWST of 69.0 degrees F. and HST 
of 90.4 degrees F. 

Cold Hand—MWST of 85.8 degrees F. and HST 
of 45.7 degrees F. 

Cold Hand-Body—MWST of 68.5 degrees F. and 
HST of 45.8 degrees F. 


Ambient temperatures of 40 degrees and 80 degrees F. 
and box temperatures of 0 to 10 degrees F. and 90 
to 110 degrees F. were used to obtain the above 
conditions. Air movement at speeds up to 15 miles 
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TABLE 1 


Errrect or Cotp HAND, Bopy Conpitions oN MANUAL PERFORMANCE 














Task Control Cold body Cold hand Cold hand-body 
KT 20.32 19.64 14.75 11.77 
BS 18.68 16.27 14.30 12.82 
BP 29.77 27.16 25.34 22.00 


Note.—The Tukey multiple-comparison test (Ryan, 1959) was used in the above comparisons. 


number of task components completed in 30 seconds. 
significant (p = .05) 


per hour was used during all cooling conditions to 
help stablize hand- and body-surface temperatures 
and to obtain the appropriate MWST and HST 
within a 20-minute exposure period. 

Skin temperature measured with copper-con- 
stantan thermocouples and rectal temperatures meas- 
ured with thermistor catheters were recorded by 
Leeds-Northrup speedomax recording systems. The 
MWST was recorded simultaneously from 10 points 
on the body by a multipoint temperature-recording 
system containing an automatic integrator. The 
skin temperature from the little finger of the right 
hand (HST) was recorded by a separate temperature- 
recording system. All recordings were taken through- 
out each test period. 


RESULTS AND DISCUSSION 


The data for each task were analyzed ac- 
cording to Latin-square analysis of variance 
designs with the four skin temperature con- 
ditions being assigned Latin letters. Subjects/ 
Sequence (KT: F = 4.46, $< .005; BS: 
f= 35.81, ? — 000) Dee Peano 10.2025) 
and skin temperature conditions (KT: F 
= 46.20, p < .001; BS: F = 30.87, p < .001; 
BP: F = 20.30, p< .001) were significant 
main effects for all three tasks. Sequence was 
a significant effect (F = 5.30, p < .05) only 
in the analysis of the BP data. The mean 
scores for each task and for each condition 
are presented in Table 1. 

Gaydos (1958), using a MWST of 78 de- 
grees F., failed to find an effect of body cooling 
on KT and BS performance. The Cold Body 
condition of the present study (MWST 69.0 
degrees F., HST 90.4 degrees F.), however, 
did result in significant performance decre- 
ments in BS and BP. 

The Cold Hand condition (MWST 85.8 
degrees F., HST 45.7 degrees F.) resulted in 


The scores are the mean 


The differences between scores within each task that are not underlined are 


KT and BS scores significantly lower than 
those found under the Cold Body condition. 
Therefore, although body cooling was found 
to affect certain manual performance tasks, 
the role of HST in manual performance is 
emphasized once again. 

The KT and BP scores for the Cold Hand- 
Body condition (MWST 68.5 degrees F., HST 
45.8 degrees F.) were found to be significantly 
lower than the KT and BP scores found for 
the Cold Hand condition. 

In exception to the above analysis, an ex- 
amination of Table 1 reveals that, for all 
tasks, manual proficiency decreased as BST, 
HST, and then BST and HST were lowered. 
It is felt, however, that the Cold Body condi- 
tion affected only the BS and BP tasks, even 
though this finding is based on a small sample. 
Thus, together with the conclusion that the 
hands must be protected from cold exposure 
for successful manual performance, it is sug- 
gested that if the hands are kept warm and 
the body is allowed to cool to a MWST of 69 
degrees F., performance decrements will occur 
in manual tasks requiring accurate placement 
or threading of objects (BP and BS). It is 
suggested, also, that cooling the body while 
maintaining normal HST does not affect per- 
formance of tasks involving only wrist-finger 
speed and dexterity (KT). 

The present study emphasizes the need for 
further studies to investigate the effect of cold 
exposure upon complex manual performance 
by differentiating among the psychomotor 
components essential to successful manual 
performance and by testing the differential 
effects of body surface cooling, hand surface 
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cooling and deep hand cooling upon those 
components of manual performance. 
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THE RELATIONSHIP OF INTENTIONS TO LEVEL 
OF PERFORMANCE 


EDWIN A. LOCKE 1 


American Institutes for Research, Washington Office 


3 laboratory experiments are reported which stem from Ryan’s approach to 
motivation. The fundamental unit is the 
the relationship between intended level of achievement and actual level of 
performance. A significant linear relationship was obtained in all 3 experiments: 
the higher the level of intention, the higher the level of performance. The find- 
ings held both between and within Ss and across different tasks. The implications 
for the explanation of behavior are discussed. 


There has been considerable research on 
the relationship of various demographic, 
social, psychological, and personality variables 
to productivity or level of performance. 
Likert (1961) and Parker (1963) have em- 
phasized human relations and supervisor vari- 
ables. Katzell, Barrett, and Parker (1961) 
and Parker (1963) have examined the effects 
of situational (e.g., city size) variables on 
performance. Dunnette, Campbell, and Jaa- 
stad (1963) among others have studied the 
effects of group structure on output quantity. 
Atkinson (1958) and McClelland (1961) 
have explored the relationship between the 
need for achievement and quantity of output. 

These approaches have in common the fact 
that they do not specify what it is the indi- 
vidual is consciously trying to do in these 
situations. The process by which situational 
and supervisory variables affect perform- 
ance is usually left unspecified or is assumed 
to involve some complex conscious or uncon- 
scious reasoning process on the part of the 
individual. The “need for achievement’? is 
specifically acknowledged not to be a part of 
the individual’s conscious experience in spite 
of its apparent influence on his behavior 
(McClelland, Atkinson, Clark, & Lowell, 
1953). 

Recently Ryan* has suggested that a con- 


1 This paper was based on the author’s doctoral 
dissertation (Locke, 1964) done at Cornell Univer- 
sity. 

2 Unnubished mimeos, 1964. Chapter I: Ex- 
plaining behavior; Chapter II: Explanatory con- 
cepts; Chapter V: Experiments on intention, task, 
and set; Chapter VI: Intentional learning; Chapter 
VII: Unintentional learning. Ithaca: Department 
of Psychology, Cornell University. 
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“intention.” The experiments examined 


siderable part of human behavior is controlled 
by the individual’s conscious intentions,* that 
is, by what the individual is trying to do. 
As Ryan* notes “It is impossible to perform 
a psychological experiment upon a human 
subject without manipulating and controlling 
his intention . . In spite of this fact, the 
experimental study of . . . [intentions] has 
been relatively neglected in modern psy- 
chology.” 

Ryan (see Footnote 2) expanding on 
earlier discussions of this topic (Ryan, 1958; 
Ryan & Smith, 1954) has suggested that a 
fruitful approach to the prediction and expla- 
nation of human behavior would be to ex- 
amine the way in which intentions (see 
Footnote 3) are related to actual behavior. 
Ryan argues that these lower level (immedi- 
ate, specific) explanations should precede the 
more abstract (higher level) explanations 
(e.g., in terms of general needs, drives, and 
presses etc.). 

The present research, a series of three 
laboratory experiments, will examine the way 
in which intentions affect level of perform- 
ance. More specifically the purpose will be to 
see how the level of intended achievement is 
related to actual level of achievement. The 
term “intended level of achievement” is very 
close in meaning to the term “level of aspira- 
tion” coined back in the 1930s, and means 


3 Ryan actually uses the term “task” to designate 
what the writer means by “intention” (Ryan uses 
the latter as a synonym). To prevent confusion, the 
author will use “intention” as Ryan uses the word 
“task” and will reserve “task” for its traditional 
meaning as “a piece of work to be accomplished.” 

4 Unpublished mimeos, 1964. Chapter V, p. 1. 
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the future level of performance an individual 
will try for. Interestingly a close examination 
of the old level of aspiration literature reveals 
that almost in no case was it used as an 
independent variable. Rather the effects of 
such variables as previous success and failure, 
amount of experience with the task, age, and 
selected personality variables on level of 
aspiration were the focus of interest (for 
reviews see Festinger, 1942; Frank, 1941; 
Lewin, 1958; Rotter, 1942). 

However, a series of studies by Mace 
(1935), though not using the level of aspira- 
tion terminology, did use it as an independent 
variable. On one complex arithmetic task he 
told one group of Ss to “do their best,” an- 
other to try to get at least “70 correct an- 
swers” in 20 minutes, another to beat a score 
representative of their best previous perform- 
ance, and a fourth group to beat a specific 
standard each day which was based on the 
skill of the individual S. In this experiment 
the last group learned much faster than the 
others. Unfortunately Mace did not vary the 
“intended level of achievement” along a 
single dimension so we have no means of 
making any predictions as to the shape of 
the relationship from his findings. We may 
conclude, however, that manipulating the 
intentions had a considerable effect on the 
learning rate of the Ss. 

In a more recent experiment Eason and 
White (1961) found that when Ss were in- 
structed to stay on the target in a pursuit 
rotor task for 0%, 50%, or 100% of the 
time, respectively, their performance matched 
their intentions quite well. In a second experi- 
ment reported in the same article there was 
a less direct manipulation of intentions. 
The task was a pursuit rotor task, and the 
targets were a series of seven concentric 
copper rings mounted in a rotating turntable 
and separated from each other by gaps. The 
Ss in different conditions were told to try and 
stay within different rings. For instance, Ss 
would be told to stay within Ring X, and 
that all time spent within Ring X including 
rings inside this ring would be counted. (Ss 
were not told, however, to stay as close to the 
center ring as possible.) The amount of time 
spent on each of the rings was computed 
separately. A “performance quality” score 


was obtained by multiplying the distance of 
each ring from the center ring by the amount 
of time spent on that ring, summed over all 
rings and divided by the total time spent on 
all rings. The scores could thus be described 
as “average distance from center” scores. 
Eason and White found that the smaller the 
target complex (i.e., the smaller the diameter 
of the ring within which they were told to 
try to remain) the higher the performance 
quality, or the less the average distance from 
center score. Eason (1963) replicated this 
finding with more Ss and greater variation 
in target size. 

Again there is little basis for making gen- 
eralizations about the relationship between 
level of intended achievement and level of 
performance though the results of Eason and 
White and Eason at least suggest a linear 
function, that is, the higher the level of 
control called for, the higher the level of 
control attained. 


EXPERIMENT I 


Method. The task in this experiment involved 
listing objects or things that could be described by 
a given adjective (eg., “heavy”). There were 15 
trials and Ss were given a different adjective on 
each trial and told to list things or objects that 
could be described by the adjective for 1 minute. 
The £ told Ss how to score their own protocols. 
Generally any answer was acceptable that did not 
repeat things in the same category (e.g., for “hot,” 
the responses “coffee,” “tea,” and “soup” would 
all be considered “beverages”). The Ss were also 
told that £ would check their answers at a later 
date. In scoring, however, E simply counted up the 
number of responses given by each S on each trial 
regardless of quality. The scores then are simply 
performance quantity scores. 

The Ss (paid summer school volunteers) were 
divided at random into three groups. Each group 
had a different “standard of success” to beat on 
each trial. In the Easy group (WV = 26) the standard 
of success was 4 things or objects on each trial. 
Thus to be “successful” on a trial Ss had to give at 
least 5 things or objects. In the Medium group (N 
=22) the standard of success was 9 objects on 
each trial. In the Hard group (V = 23) the standard 
of success was 14 objects on each trial. A successful 
trial was defined as one on which a S beat his 
standard. The Ss in all groups were told that this 
was a test of creativity and that the standards were 
“what E considered to be a successful performance 
on the basis of his experience with the task” and 
represented “slightly above the average perform- 
ance.” 

The in this 


“levels of intended achievement” 
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Mean Productivity (I5 Trials) 


Standard 


Fic. 1. Mean Productivity per trial by group— 
Experiment I. 


experiment, therefore, were taken to be the standards 
of success set by the E. 

The Ss kept track of their successes by counting 
up the number of acceptable responses given on 
each trial and indicating on a score sheet whether 
or not they had beaten their standard after each 
trial. Before each trial Ss indicated their subjective 
probability of beating the standard on the forth- 
coming trial by circling the appropriate number on 
a scale from .05 to 1.00 graded in steps of .05. 

Results. As indicated previously the level 
of performance scores was simply the total 
number of responses given by each S$ on each 
trial. To check for equal ability all Ss began 
with a practice trial on which they were told 
to “do their best’; ¢ tests on the mean scores 
for each group on this trial were computed. 
The Easy group had a significantly higher 
mean output than each of the other two 
groups on the practice trial; however, no 
corrections were made in the experimental 
data due to the substantial differences that 
emerged between the Easy groups and the 
other two anyway. Thus the differences be- 
tween the Easy group and the other two are 
slight underestimates of the true differences. 

Figure 1 shows the results for each group 
combined for all trials; mean output per 
trial is shown as a function of the standard. 
Table 1 shows the results of a trend analy- 
sis® and ¢ tests performed on these data. 
There was a clear, significant linear trend, 
and it was accounted for almost entirely by 
the difference between the Easy group and 
the other two. 

Lest these results seem in some way 
“obvious” (because the Ss were presumably 
trying to do what they were told), the 


5 Equal intervals were assumed for the trend 
analysis. This assumption appears justified as the 
intervals were equal both in terms of the standards 
and the mean objective degrees of success. 


number of trials (for all Ss in each group 
combined) on which the Easy, Medium, and 
Hard groups actually beat the Easy group 
standard (i.e., 4) was also computed. The Ss 
in the Easy group, whose task was to beat 
a standard of 4, actually beat it compara- 
tively less often than did the Ss in the Me- 
dium and Hard groups whose standards were 
higher. The overall chi-square value for the 
frequency data was 10.75 (p< .01). 
Discussion. These results clearly support 
the notion of a linear function relating level 
of intended achievement and level of per- 
formance. Due to limitations on the ability of 
the Ss there is no exact correspondence be- 
tween the two variables but the shape of the 
function is the important thing. This rela- 
tionship held even though the Hard standard 
was so difficult to reach that the objective 
(proportion of successes) and_ subjective 
(mean ratings) probabilities of being success- 
ful as it were, respectively only .13 and .17. 


EXPERIMENT II 


The previous experiment was designed sim- 
ply to determine the shape of the relationship 
between level of intended achievement and 
level of performance. This experiment was 
designed with four additional considerations 
in mind: (a) How high a level of aspiration 
would Ss set if allowed to set it themselves? 
(6) What would be the effect of changing 
the standards for some Ss during the experi- 
ment? (c) Would the same difference be- 
tween the Easy and Hard conditions emerge 


TABLE 1 


TREND ANALYSIS AND ¢ TEST RESULTS 
FOR PRODUCTIVITY: EXPERIMENT I 




















Source SD) df F p 
Between 46.20 2 — 
Linear 43.50 1 10.82 O01 
Quadratic 2.70 1 << ns 
Within 273.07. 68 — 
Comparison df t P 
Easy versus Hard AT 2.96 O01 
Easy versus Medium 46 233 .05 
Medium versus Hard 43 1315 ns 
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if the experiment were continued for an addi- 
tional five trials? (d) Would the previous 
findings replicate using a slightly different 
task? 


Method. The task in this experiment involved 
giving uses for objects (e.g., “an ash tray”). There 
were 20 trials and Ss were given a different ob- 
ject on each trial and told to list possible uses for 
each object for 1 minute. Again E told the Ss how 
to score their own protocols and again almost any 
answer was considered acceptable that did not 
repeat responses in the same category. Again E 
also scored the protocols himself by merely counting 
up the number of responses given by each S without 
regard to quality. 

The Ss (from the introductory psychology S pool) 
were divided at random into four groups. As in the 
previous experiment each group had a different 
standard of success to beat on each trial. Two 
conditions were identical to conditions in the 
previous experiment. In the Easy group (N = 27) 
Ss had to beat a standard of 4 uses on each trial to 
be successful. In the Hard group (VN = 29) Ss had to 
beat standard of 14 uses. 

In a third group, however, the Self-Set group 
(NV = 27), Ss were allowed to set their own stand- 
ards on each trial. They were told to set their stand- 
ards anywhere they wished but to try and give as 
many uses as they could. The fourth group, the 
“Progressive” group (N = 29), was given a different 
standard on each trial. The standard at first was the 
same as that for the Easy group (ie., 4) but 
gradually got harder until on the last trial it was 
slightly higher than the standard of the Hard group 
(i.e., 15). The Ss in all groups (except the Self-Set 
group) were again told that this was a test of 
creativity and that the standards “were what E 
considered to be a successful performance on the 
basis of his experience with the task” and that they 
represented “slightly above the average performance.” 

The Ss again kept track of their successes by 
counting the number of acceptable responses after 
each trial. 


Results. This experiment began with a 
practice trial, as did the previous one, in 
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TABLE 2 


Tue ¢ Test RESULTS FOR PRODUCTIVITY: 
EXPERIMENT IT 











Comparison df t p 
Easy versus Hard 54 6.20 001 
Easy versus Self-Set 52 4.50 O01 
Easy versus Progressive 54 3.30 01 
Self-Set versus Hard 54 2°53 05 
Self-Set versus Progressive 54 <it: ns 
Progressive versus Hard 56 3.14 01 





which all Ss were asked to give “as many uses 
as possible” for the same object. The ¢ tests 
on the mean output of the four groups yielded 
no significant differences indicating equal 
initial ability of the groups. Again the per- 
formance level scores were simply the total 
number of responses as computed by £. 

In Figure 2 mean output per trial is shown 
as a function of the mean standard. The 
Progressive group is broken down into the 
first 10 and last 10 trials since the standards 
increased so much from the first to the last 
trial. The mean standard of the Self-Set 
group was obtained by averaging the stand- 
ards the Ss set for themselves and which 
they had indicated in writing before each 
trial. Figure 2 suggests a clear linear re- 
lationship between the level of intended 
achievement and the level of performance. 

Since unequal intervals between the groups 
precluded a standard trend analysis, multiple 
t tests were performed on the mean produc- 
tivity scores of each group. For the ¢ tests 
the Progressive group was not broken down 
into two halves in order to avoid having to 
make an even more excessive number of ¢ 
tests. The ¢ tests results in Table 2 show the 
Easy and Hard groups to be significantly 
different from each other and from each of 
the other two groups which again argues 
strongly for a significant linear trend. 

It is interesting that the objective degree 
of success for the Progressive group dropped 
from .35 on the first 10 trials to .02 on the 
last 10 trials, yet the ¢ value for the increase 
in output of this group in the last 10 trials 
is 7.02 which is significant at the .001 level. 

The number of trials on which Ss in each 
group failed to beat the Easy standard 
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(i.e., 4) was again computed, The Easy group 
whose task was to beat a standard of 4 uses 
and the Progressive group on the first 10 
trials whose mean standard was 7.6, failed 
to beat it comparatively more often than 
the other groups which had higher standards. 
The chi-square value for the frequency data 
as a whole is 43.2 (p < .001). 

Discussion. This experiment yielded the 
following answers to the questions posed 
earlier: (a) Ss set moderate (objective prob- 
ability of success = .53) levels of intended 
achievement if given the choice and told to 
do “as well as possible.” (b) Raising the 
standards within the same group resulted in 
a marked increase in output as the standards 
increase. (c) The Hard group continued their 
high output through 20 trials even though 
they were rarely able to beat the standard 
(objective probability of success = .07). (d) 
The difference between the Easy and Hard 
groups was replicated with a different though 
similar task. 

Again the linear model relating level of 
intended achievement and level of perform- 
ance was supported, thus some basis was 
made for the generality of the original finding. 


EXPERIMENT III 


In the previous experiment the Progressive 
group showed a highly significant (p < .001) 
increase in output as the standards increased. 
The purpose of this experiment was to repli- 
cate the findings for the Progressive group 
in a more systematic manner. One problem 
with this group was that it never had time 
to become “adapted” to any given standard 
as it changed on every trial or every other 
trial. In addition there was no control for 
item difficulty and practice effects in that 
experiment. The present experiment was 
designed to remove these drawbacks. 


Method. The task (uses) and method of scoring 
were identical to those in the previous experiment. 
Again the task was introduced as a test of 
creativity. 

The subjects (23 members of an introductory 
course in motivation) began with a practice trial 
on which they were told to “do their best.” Then they 
were told that on the first six trials their task would 
be to beat a standard of four uses in 1 minute on 
each trial (Easy condition). No reason was given 
for choosing this standard. Again Ss rated their 


subjective probability before each trial and calculated 
their score after each trial.® 

After the sixth trial Ss were told that on the next 
six trials they could choose their own standards 
(Self-Set condition) and were to try and beat them 
but also to try and give as many uses as possible.? 
In addition to the probability ratings and scoring, 
Ss wrote down the standard they had chosen be- 
fore each trial. 

After the twelfth trial Ss were told that the 
standard on the first six trials had been only the 
national mean for college freshmen, but that now 
they were to try and beat a standard equivalent 
to the eightieth percentile for Ivy League graduate 
students (Hard condition). These instructions were 
to motivate them to accept the new and harder task 
which was to beat a standard of 14 uses on each 
trial. 

Because not all objects on the test are equally 
difficult some control had to be made for item 
difficulty. This was done by choosing three groups 
of six objects each which were of equal difficulty 
according to mean scores obtained on each object 
by all Ss combined in the previous experiment. 

Results. The method of scoring was the 
same as in the previous experiments. How- 
ever, the raw scores in this experiment were 
corrected for practice effects as follows: in the 
previous experiment Ss in the Easy condition 
improved by an average total of 3.6 uses from 
the first six to the second six trials. Thus 
the expected increase in output from the Easy 
to the Self-Set condition exclusive of experi- 
mental effects was 3.6 and this was subtracted 
from each individual’s total improvement 
score from the first six to the second six trials. 
Similar logic was followed to obtain correc- 
tion factors for the Easy-Hard and Self-Set- 
Hard differences. 


6In this experiment subjects actually wrote down 
the number of (acceptable) responses they had 
given. The experimenter’s own count later cor- 
responded almost perfectly with the subjects’. Again, 
of course, this says nothing about the quality of any 
of the responses. 

7 At this point one student asked whether they were 
supposed to set the standards so that they could 
beat them as much as possible; the experimenter re- 
plied that they were supposed to “give as many uses 
as possible.” It is possible, however, that some sub- 
jects still. thought they were supposed to set the 
standards low enough so that they could easily beat 
them. Unlike the other conditions “success” for this 
group was defined by the subjects (since they set 
their own goals), not by the experimenter. Thus it 
was possible to try and reach one goal (e.g., success) 
at the expense of the other (e.g., productivity) 
This is also true of the Self-Set group in Experi- 
ment II, but the latter did not sacrifice productivity 
to success to the extent this group did (see text). 
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TABLE 3 


RESULTS OF TUKEY MULTIPLE-COMPARISONS TEST 
FOR PRODUCTIVITY: EXPERIMENT IIT 





Mean 
difference 
of totals Significant 
Comparison (corrected) WSD (p < 05) 
jasy versus Hard 9.82 3.90 yes 
lard versus Self-Set 6.09 3.28 yes 
‘asy versus Self-Set 1.93 3.28 no 





Note.—SSre = 1,297.69, df = 44, MSerror = 29.49, S = 1.13, 
SD = 3.22, WSD = 3.90 (3 means), WSD = 3.28 (2 within 3). 


Figure 3 shows the total uses per trial (un- 
orrected) for each of the 18 trials. It is clear 
hat the output increased as the standards 
creased. The equal Ns in the different con- 
itions (they were the same Ss) made a Tukey 
1953) multiple-comparisons test possible on 
he output data. The results of this test are 
hown in Table 3. After correction only the 
vasy-Self-Set difference failed (barely) to 
each significance. 

The reason why the Easy and Self-Set con- 
itions did not yield a greater difference is 
uggested by an examination of the mean 
tandards set by the latter group (see Figure 
). The Self-Set group set their standards 
yw enough so that they were able to beat 
hem 70% of the time, almost as often as the 
yasy condition (78% of the time). Either 
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the instructions to the Ss in the Self-Set 
trials were not clear (see Footnote 7) or the 
Ss had some sort of “set to succeed” carried 
over from the first six trials. 

Another interesting finding related to the 
shape of the output curve in each condition. 
The objects were given in order of difficulty 
in each condition, with the easiest first and 
hardest last. It is evident from Figure 3 that 
the output in the Easy condition was a direct 
function of the difficulty of the items, that is, 
the curve shows a steady downward trend. 
However in the Self-Set and especially in the 
Hard condition the trend is modified, becom- 
ing U shaped in the latter. Mace (1935) 
found that one result of setting standards for 
Ss to aim at was to prolong effort towards 
the end of the work period. Mace (1935, p. 
23) presents output curves of groups with 
and without specific standards that are very 
similar in shape to the Hard and Easy output 
curves, respectively, obtained here. 

Finally the number of times Ss failed to 
beat a standard of four in each condition was 
computed. (These data were not corrected 
because of the difficulty of correcting for single 
trials.) Again the Self-Set and Hard condi- 
tions resulted in less failure to beat a standard 
of four than in the Easy condition where this 
was the actual standard. The chi-square value 
for the frequency data is 6.95 (p< .05). 

Discussion. This experiment again sup- 
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ported a linear relationship between level of 
intended achievement and level of perform- 
ance. Most important it appears that the re- 
lationship holds both between and within Ss. 

The data on the shape of the output curves 
are suggestive and warrant further research 
on the precise effects of standards of per- 
formance. 


CONCLUSION 


On the basis of the experiments reported 
here we can make one major generalization: 
the higher the level of intended achievement 
the higher the level of performance. This in- 
cludes levels of intention so high that Ss can 
reach them less than 10% of the time. 

If we were to take task difficulty or proba- 
bility of success as the independent variable 
(although in these experiments difficulty was 
clearly dependent upon the level of intended 
achievement) these short-term results do not 
support Atkinson’s theory (1957) which pre- 
dicts that a maximum level of performance 
will be obtained when the probability of suc- 
cess is moderate (.50) and will be uniformly 
low as the probabilities decrease from this in 
both directions. Although one experiment by 
Atkinson (1958) supports his theory, other 
data reported by McClelland (1961, p. 216) 
appear to support the linear model obtained 
here. The important difference between Ry- 
an’s (see Footnote 2) approach and theories 
such as those of Atkinson is that the latter 
attempt to go directly from aspects of the 
task or situation to behavior without taking 
account of the intentions of the Ss. Unless 
one assumes that man responds automatically, 
like a robot, to situational pressures, then it 
would seem unwise to expect theories which 
do not account for intentions to explain all of 
behavior. Although Atkinson’s complete model 
uses an individual variable (need for achieve- 
ment) even this is asserted not to be part of 
the individual’s conscious experience. 
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PREDICTIVE VALIDITY OF THE INTERVIEW * 
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This paper describes the use of the interview technique as a valid and reliable 
instrument for predicting job placement and vocational success. The inter- 
views of 144 blind adults were objectively and quantitatively scored, making 
full use of all responses elicited by the S. The results indicated that job 
success and vocational placement are significantly related to a number of 
variables tapped by the interview, such as perception of blindness, learned ways 
of dealing with tension, interpersonal interaction, and employment potential. 


In dealing with human beings, it is fre- 
uently necessary to secure information which 
un be obtained only by using an interview 
chnique. The reliability of such procedures 
often open to serious question (Bingham, 
[oore, & Gustad, 1959). The literature is 
plete with studies which indicate that the 
iterview, in general, is sometimes less than 
useful and objective means of securing in- 
mation (Anderson, 1960; Campbell, Prien, 

Brailey, 1960; Hinrichs, 1960; Trites, 
960). The attitude of many researchers has 
een succinctly summarized by Dunnette 
1962): 


The continued uncritical use of the personal inter- 
ew offers a clear illustration of excessive delay in 
idertaking needed research. Nearly everyone uses 
is costly, inefficient, and usually nonvalid selec- 
yn procedure. Yet, practically no one performs or 
ports on interviewing research. I could not agree 
ore strongly with the suggestion by England and 
uterson (1960) that there be a moratorium on 
yoks, articles, and other writings about ‘how to 
terview,’ ‘do’s and don’ts’ about interviewing, and 
e like, until there is sufficient research evidence 
out the reliability and validity of the interview 
an assessment device to warrant its use in such 
ork [p. 291]. 


The general subjectivity of the interview 
as been emphasized by Burroughs (1958), 
udek (1963), Sydiaha (1961), Webster 


1 This study is part of a project titled, Vocational 
iccess of Blind Adults, supported by a grant from 
e office of Vocational Rehabilitation, Department 
Health, Education and Welfare. 

2Now at Ball State University. 

8 Appreciation is extended to L. M. Baker, John 
. Palacios, and H. C. Hart for their editorial 
sistance. 
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(1959), and others. The general opinion 
seems to be that voice qualities, personal ap- 
pearance, mode of attire, and other qualities 
of the individual are just as important to deci- 
sion making in the interview as is the infor- 
mation secured from the interviewee. The 
other extreme to interview scoring and analy- 
sis is well illustrated in Starkweather and 
Deeker’s (1964) word-count computer pro- 
gram and Stone, Bales, Namenwirth, & 
Ogiivie’s (1962) and Holsti’s (in press) auto- 
mated procedure to develop dictionaries for 
different types of interview content. 

This paper describes an approach to inter- 
view analysis which lies between the clinical 
global interpretation, and the word count, 
dictionarytype approaches. Any interview is 
usually designed to elicit information in such 
areas as interpersonal interaction, job satis- 
faction, employment potential, social compe- 
tency, and the like. This approach makes it 
possible to rank objectively and quantita- 
tively the interviewees on various continuua 
tapped by the interview, making full use of 
the verbal response elicited by each question. 
It is possible to adapt this scoring technique 
to a variety of structured or semistructured 
interviews, that is, personnel selection and 
clinical procedures involving diagnosis, psy- 
chotherapy, and vocational counseling. 


MeETHOD 
Instrument and Administration 


The interview consisted of 43 open-ended ques- 
tions designed to assess adjustment variables in a 
blind-adult population. It sought information in the 
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areas of (a) perception of blindness, (b) religiosity, 
(c) learned ways of dealing with tension, (d) ability 
to travel and locomote, (e) family adjustment, 
(f) work history and employment potential, and 
(g) interpersonal] interaction. 

The interview was tape-recorded and later tran- 
scribed to typescript. The Ss were made aware of 
the presence of the recorder either by touching it, 
or by hearing their voices played on it; and in some 
instances both techniques were utilized to familiarize 
the Ss with the tape recorder. The interview was 
administered as part of a large battery of tests 
designed to assess the employment success of blind 
adults. 


Subjects 


The Ss were all legally blind, between the ages of 
20 and 50, and affected by no other major handicap. 
The 144 Ss were secured through state and private 
agencies servicing the blind. All Ss met the basic 
requirements of age, blindness, and employment in 
a variety of work situations or had been unem- 
ployed for 6 months or more prior to the date of 
testing. A token payment of $5.00 was made to 
each S for his cooperation. The sample breakdown 
is as follows: (a) 32 Ss worked in competitive in- 
dustry, (6) 51 in blind workshops, (c) 22 in re- 
habilitation agencies, (d) 7 in vending stands, and 
(e) 32 were unemployed. 


Development of a Scoring Technique 


In line with past research using interviews, a 
scoring system was devised in which global clinical 
judgments were used to rate the Ss on Likert-type 
5-point scales for each continuum. The judges read 
all the responses to the questions that defined a cate- 
gory and rated the S according to the general opera- 
tional definitions set up for each point on the scale. 
Interrater reliability ranged from .10 to .62 indi- 
cating very low reliability. Thus, the scoring system 
had to be completely revised. 

First, specific criteria were set up for each ques- 
tion, rather than depending upon global judgments 
for each continuum. Second, the rigid adherence to 
a 5-point scale was eliminated. Responses to ques- 
tions that could not be reliably divided into five 
categories were dichotomized or trichotomized. As a 
result, the range of response categories for any one 
question reflected a content analysis compiled from 
the responses of 90 randomly selected interview 
protocols. The response categories for each question 
were weighted so that the more maladjustment a 
response, the larger the weight. The clinical judg- 
ments of the authors were the criteria for “malad- 
justment” upon which the original weighting was 
done. The score for any one continuum, then, was 
simply the sum of the weights assigned to each 
response subsumed under the continuum. 


Interrater Reliability 


Two judges independently scored 10 complete 
protocols. The average interrater reliabilities as ob- 
tained from analysis of variance formulae were 


(a) perception of blindness r= .88, (0) religiosity 
y=.91, (c) learned ways of dealing with tension 
y= .75, (d) ability to travel r=.77, (e) family ad- 
justment r= .91, (f) work history r=.79, (g) inter- 
personal interaction r= .89. These coefficients, which 
are unusually high for rating scales, were particularly 
encouraging because of the small sample and the 
necessarily restricted range. 


Interview Questions and Scoring Criteria 


The following questions and scoring criteria are 
subsumed under the “learned ways of dealing witk 
tension” continuum. They will serve as illustration: 
of the open-ended questions and weighting system 
utilized in scoring. 

Question. What do you consider to be your strong: 
est fear? 

1 point. (a) A response which indicates that the 
individual has no major source of fear (this is no! 
a denial of fear, per se). Examples: “I suppose theré 
are things I’m afraid of, but I can’t think of any 
single thing which frightens me most.” 

2 points. (a) Naming of any specific fear or fears 
Examples: spiders, lightning, thunder, snakes, fire 
(b) Any concrete fear relating to blindness. Ex 
amples: Fear of traffic, of crossing streets alone, fea 
of losing remaining vision, etc. (c) Denial of any 
fears at all. Examples: “I’m not afraid of anything” 
“There ain’t nothing I’m afraid of”; “Fear is % 
weakness I don’t have.” (d) Rejection of questio1 
or refusal to answer. Examples: “I don’t know” 
“JT can’t answer that one”; “I think that’s toc 
personal”; etc. 

3 points. (a) Any response which is indicative o 
feelings of personal inadequacy. Examples: “Bein; 
humiliated”; “I’m most afraid of being alone”; “ 
guess it would be that I just wouldn’t be able to gi 
On wet; 

4 points. (a) Any fear that appears to be of a1 
extreme or phobic nature. Examples: “I’m mos 
afraid of losing my vision in the middle of the street 
I think about it everytime I go somewhere. I jus 
can’t get it out of my mind”; “I have claustro 
phobia—I just can’t stand being cooped up in % 
small place somewhere.” 

Question. Do you ever get mad or angry? 

1 point. (a) Any positive response indicating tha 
anger is experienced within normal limits. Examples 
“Ves, I wouldn’t be normal if I didn’t”; “Sure 
everyone gets mad now and then,” etc. 

2 points. (a) Obvious reluctance in admittin, 
anger. Examples: “Not very often—in fact, I almos 
never get mad.” “For all practical purposes, I woul 
say I don’t get mad.” 

3 points. (a) Complete denial of temper, feeling 
of hostility or anger. Examples: “No, I never los 
my temper.” “No, I never get mad.” (b) Admissio: 
of very severe temper. Examples: “Frankly, jus 
about everything makes me angry. I lose my tempe 
all the time.” “Yes, I’m afraid I have a violen 
temper.” 

Question. What kind of recreation do you enjo' 
most? 
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1 point. (a) The subject actively participates in 
me form of recreation, or any combination of 
tive and passive participation. Examples: “I like 
swim and to bowl.” “I prefer to play cards, read, 
id talk to friends.” 

2 points. (a) Participation in passive form of 
creation only. Examples: “I like to watch tele- 
sion more than anything else.” “Reading is my 
bby, and I also like to watch sporting events.” 

3 points. (a) No participation in any form of 
creational activity and guilt about recreation. Ex- 
aples: “I don’t do much of anything—just sit 
ound.” “There’s not much I can do as a blind 
an.”? 

alidit’y 

Job hierarchy. It was hypothesized that scores on 
e Diagnostic Interview would be inversely related 
occupational achievement. The more maladjusted 
e Ss were in each area of the interview, the less 
ely it would be for the S to be in a complex or 
sponsible position. The Job Hierarchy Scale 
ones, 1960) was used as the criterion for job 
ccess. The scale lists 100 job descriptions which 
e grouped into 31 categories. Scale values were 
termined from paired-comparisons made by 8 
dges and assigned to the categories in order to 
lect vocational achievement. Unemployed Ss were 
t included in this analysis. 

Employment location. In addition to vocational 
hievement, certain hypotheses were made with 
zard to the location of employment. The employ- 
ent environment and its unique demands would 
expected to reflect differences in adjustment. Ac- 
rdingly, the sample was divided into five employ- 
sent groups based upon location. 

1. Competitive Employment—this category refers 
those jobs which are typically performed in work 
vironments where the sighted person predominates. 
2. Shop Employment—these jobs are traditionally 
Id by the visually disabled in what may be called 
heltered” or subsidized work situations. In this 
iployment area, the sighted worker is in the 
nority or is nonexistent. 

3. Agency Employment—jobs in this category 
pically involve administrative or service positions 
thin the vocational rehabilitation framework. 
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4. Vending Stand Employment—these jobs involve 
the managing of retail outlets established by law 
in public buildings. Although the employee is typi- 
cally disabled, these are not really “sheltered” posi- 
tions. The individual is essentially self-employed and 
maintains direct contact with the public. 

5. Unemployment—this category consists of those 
individuals who were available for employment but 
who had not been employed for 6 months or longer 
prior to the time of the study. 

It was expected that the unemployed Ss would be 
the least well-adjusted. Shop employees, working in 
a sheltered atmosphere, were expected to be more 
adjusted than the unemployed Ss, but less adjusted 
than the other employment groups. No major dif- 
ferences were hypothesized between the remaining 
three groups since each required personal initiative 
and contact with sighted people. 


RESULTS 
Job Hierarchy 


The intercorrelation matrix for the adjust- 
ment continua and Job Hierarchy Scale is 
presented in Table 1. A multiple 7 of .484 
was obtained. Realistic acceptance of blind- 
ness, good work history and potential, and 
adjusted interpersonal relations were signifi- 
cantly related to occupational achievement. 


Employment Location 


Table 2 presents the results from the analy- 
sis of variance using the competitive, shop, 
agency, vending, and unemployed groups. Of 
the seven continua, perception of blindness, 
ways of dealing with tension, and interpersonal 
relations were significant beyond the .01 level; 
work history and potential were significant be- 
yond the .05 level; religious conflict, ability 
to travel, and family adjustment were 
nonsignificant. 

It was. expected that all seven of the ad- 


TABLE 1 


INTERCORRELATION Matrix OF THE INTERVIEW PROFILE ANALYSIS AND JoB HreRARCHY 














Continua I II III IV V VI VII JH 
Perception of blindness _ 14 46 .20 730) 230 43 —.38 
Religious conflicts — .03 —.01 01 —.03 7 —.20 
Learned ways of dealing 

with tension — .26 24 40 58 — .23 
Travel — —.01 25 Al —.18 
Family adjustment — 22 a2 .00 
Work potential — 30 —.31 
Interpersonal interaction — —.25 





Note.—N = 112. 
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TABLE 2 


ANOV or THE INTERVIEW PROFILE ANALYSIS AND 
THE EMPLOYMENT GROUPS 





Continua Source af MS F 
Perception of blindness Groups A 1671902 sia 25 
Error 139 15.028 
Religious conflict Groups 4 6.426 1.131 
Error 139 5.678 
Learned ways of dealing Groups 4 128.318  6.658** 
with tension Error 139 19.272 
Travel Groups 4 2.084 1.084 
Error 139 1.921 
Family adjustment Groups 4 2.214 .649 
Error 139 3.408 
Work potential Groups 4 39.900 3.305* 
Error 139 12.069 
Interpersonal interaction Groups 4 288.534 28.502** 
Error 139 10.123 





justment continua would differentiate among 
groups. The obtained results indicated either 
that religious conflict, ability to travel, and 
family adjustment were not related to em- 
ployment location, or that the interview was 
not measuring validly these three areas. These 
three scales had the least number of questions 
defining them, and, therefore, the smallest 
range of scores. 

Table 1 shows that the four continua 
which were significantly related to vocational 
achievement were intercorrelated and that 
correlation coefficients ranged from .300 to 
.579. The positive intercorrelations were 
further support of the hypothesis that ad- 
justment in all validly measured areas of the 
interview would be related to job environ- 
ment. 

Table 3 presents the means and standard 
deviations of the four significant continua. 
The F tests, applied to all possible paired- 
comparisons between agency, vending stand, 
and competitive groups, demonstrated no 
significant differences (p > .10). However, 
the mean of these three groups differed sig- 
nificantly from both shop employees (F = 
15.55, p< .01) and unemployed Ss (F = 
32.89, p< .01) in perception of blindness. 
Unemployed Ss were significantly more mal- 
adjusted than shop employees (F = 6.09, p 
< .025). The results are interpreted to indi- 
cate that the unemployed group perceived 
their blindness as a crippling handicap, and 
that their self-concept was markedly domi- 


nated by feelings of inferiority. The sho 
group was more adjusted but exhibited 

passive attitude toward their environmen 
They expected and demanded special priv. 
leges because of their blindness. The agency 
vending stand, and competitive Ss perceive 
their blindness in a more realistic manne 
were not overwhelmed by guilt or feelings ¢ 
inadequacy, and felt that they could live 

full and enjoyable life. 

Similar results were obtained with th 
‘“Jearned ways of dealing with tension” cor 
tinuum. The mean of agency, vending stan 
and competitive Ss differed significantly frot 
shop employees (F = 7.47, p < .01) and u 
employed Ss (F = 23.95, p< .01). The w 
employed group was significantly inferior 1 
the shop group in the use of tension-reductio 
mechanisms leading to adjustment (F = 7.11 
p< .01). In other words, agency, vendin 
stand, and competitive Ss seemed to be ab 
to resolve conflicts more efficiently, enjoye 
life more fully, had realistic and attainab 
ambitions, and expressed a positive sel 
concept. The unemployed, and to a less 
extent the shop group, expressed unattainab 
goals; exhibited feelings of guilt concernir 
recreation and play; and were encumbere 
with fears, anxieties, worries, and conflic 
in dealing with everyday situations. 

Most of the variance in work history ar 
potential was due to the very high score « 
the unemployed Ss. They differed significant 
from the other groups beyond the .01 leve 
the remaining groups did not differ from or 
another although the differences were in tl 
expected direction. Unemployed Ss indicate 
their difficulty in securing and holding a jo 
negative attitudes toward work, dependenc 
needs, and feelings of inadequacy in achie 
ing economic security and independence. ] 
contrast, the other Ss expressed confidence ; 
their abilities for gainful employment, d 
rived a great deal of satisfaction from the 
jobs, and felt that blind people should be sel 
supporting and self-sufficient. 

The mean of agency, vending stand, ar 
competitive Ss for interpersonal interactio1 
differed significantly from shop employe 
(F = 5.82, p< .025) and from unemploys 
Ss (F = 10.47, p < .01). The shop group d 
not differ significantly from the unemploys 
group (F = 1.50). The agency, vending stan 
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TABLE 3 


MEANS AND STANDARD DEVIATIONS OF THE INTERVIEW PROFILES 
AND THE EMPLOYMENT GROUPS 








Competitive 





Continua (WW =32) 

Perception of blindness 

M Seal 

SD 2.97 
Religious conflicts 

M 6.56 

SD 2.47 
Learned ways of dealing 

with tension 

M 19.19 

SD 4.30 
Travel 

M 5.44 

oD 1.24 
Family adjustment 

3.03 

SD 1.56 
Work potential 

M 10.00 

SD 1.67 
Interpersonal interaction 

M de 3d 

SD Sri2 


Shop Agency Vending Unemployed 
(NV =51) (NV =22) (NV =7) (N= 32) 
17.61 13.36 14.86 19.75 
4.93 73 215 4.22 
6.51 5.45 7.00 6.66 
2.43 20 BAS 2.20 
21.06 18.55 18.14 23.69 
4,89 4.86 4.77 3.46 
5.41 5.00 4.71 5.63 
1.46 1.18 82 1.69 
Oro SRS 3.43 Brie 
2.06 1.89 .85 1.97 
10.55 9.32 9.28 16.81 
2S 1.46 1.94 2.34 
11.94 9.64 10.14 12.81 
3.38 3.44 2.89 2.24 





id competitive Ss seemed to exhibit confi- 
nce and pleasure in interpersonal relations. 
hey belonged to and actively participated in 
cial organizations and did not experience 
elings of social isolation and rejection as a 
sult of their blindness. The unemployed and 
op Ss felt inadequate in social interaction, 
are social isolates as a result of their visual 
ndicaps, and experienced frustration and 
ier when dealing directly with people. 


DISCUSSION 


All four of the significant continua, that is, 
rception of blindness, learned ways of deal- 
g with tension, employment potential and 
rk history, and interpersonal interaction, 
fferentiated among groups, both as a func- 
m of position in the job hierarchy and of 
cation of employment. In the perception of 
indness continuum, the unemployed Ss were 
mificantly more maladjusted than the Ss 
10 worked in blind workshops, and the shop 


employees were significantly more malad- 
justed than the members of the other three 
groups (competitive, agency workers, and 
vending stand personnel). Exactly the same 
results were determined as a function of the 
category, “learned ways of dealing with ten- 
sion.” In the employment history and work 
potential continuum, only the unemployed 
group was significantly differentiated from the 
other four groups. However, the remainder of 
the results were in the hypothesized direc- 
tion (that is, that the shop employees would 
be more maladjusted, employmentwise, than 
the other three groups), and it is possible 
that a more rigorous definition of this con- 
tinuum would result in the appearance of 
statistically significant differences. There 
were no differences between the shop workers 
and the unemployed Ss as a function of the 
“interpersonal interaction” category. Both of 
these groups, however, were significantly 
different from the agency, vending stand, and 
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competitive groups. This indicated that the 
unemployed and the shop groups have equal 
degrees of difficulty in interpersonal relations, 
and that the remaining three groups are 
equally facile in interacting with other peo- 
ple, at least to an approximate degree. 

Although three of the continua (family ad- 
justment, travel, and religiosity) did not dif- 
ferentiate among the groups, all of the statis- 
tical results were in the predicted direction. 
The authors feel that the inclusion of a larger 
number of items in these categories, and a 
more rigorous definition of these continua, 
would alter them sufficiently to bring about 
results at an acceptable level of significance. 

The heuristic value, however, of these re- 
sults may lie not so much in the construction 
of one more instrument which measures a 
particular type of content, but rather in the 
development of an objective scoring tech- 
nique. The modification of a scoring scheme 
from one made by experienced clinicians (who 
recorded their global judgments after reading 
all of the responses to questions which fell 
into a particular continuum) to one in which 
specific and objective scoring criteria were 
set up for each question in the interview, 
markedly changed the results of this project. 
The initially low reliability of the first scor- 
ing system was raised by the objective scoring 
system to indices which are outstandingly 
high for this type of instrument. In addition, 
the Diagnostic Interview was found to differ- 
entiate among employment groups of legally 
blind adults at a high level of significance, 
and was significantly correlated with voca- 
tional achievement. This indicates that the 
device is not without validity. 

The revised and more objective scoring 
system might be compared to that utilized in 
individual intelligence tests (Wechsler, 1958) 
and Rotter’s Sentence Completion test (1949), 
the verbal portions of which are really noth- 
ing more than highly structured interviews. 
The pre-established questions of the inter- 
view were presented to the Ss, responses were 
tape-recorded and converted to typescripts, 
and each question in the protocol was sub- 
jected to individual scoring on the basis of 
weighted scoring criteria. Except that re- 
sponses are manually recorded by the tester, 
the process is entirely analogous to the system 


utilized in the above mentioned techniques 
The process of adapting the same genera 
system to other forms of interviews, in al 
effort to achieve objective scoring, can br 
seen readily. 
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INFLUENCE OF DISPLAY, RESPONSE, AND RESPONSE 
SET FACTORS UPON THE STORAGE OF SPATIAL 


INFORMATION IN COMPLEX DISPLAYS * 


WILLIAM C. HOWELL ano JERRY D. TATE 
Ohio State University 


Immediate recall for spatial information was studied as a function of stimulus 
load under 2 display formats, 2 response formats, and 2 response set conditions. 
4 groups of 10 Ss each served under 15 replications of all response-format, 
stimulus-load conditions; groups were distinguished on the basis of display 
format and set. Each S viewed either a spatial or tabular display of 14-26 
geometrical stimuli for 16 sec.; he was then required to report—on either a 
tabular or spatial response form—the location of relevant stimuli. Correct 
responses and misplacement errors increased more rapidly for the spatial format 
as more stimuli were presented. Recoding from 1 display to the other response 
format did not yield serious decrements. Contrary to expectation, response set 
enhanced all conditions to a nearly equivalent degree. Results are interpreted 


in terms of the “chunking” hypothesis. 


In most advanced information-processing 

ystems the bulk of the storage operations, 
nd particularly those of extended duration, 
ave been shifted from men to machines. Still, 
owever, there remains a significant number 
f operations in which some portion of the 
iort-term storage load either can or must be 
andled by the human. Some of these opera- 
ons require continuous memory, or the re- 
ntion of recent items in an ongoing sequence 
f stimuli: telegraphers and language inter- 
reters, for example, usually lag considerably 
ehind the input in their output performance. 
ther tasks require storage in a spatial con- 
xt: viewers of large-scale group displays, for 
sample, must retain momentarily informa- 
on pertaining to one location while viewing 
nother. 

Classical research in the area of short-term 
stention has focused upon measurement of 
nmediate memory span and upon the mecha- 
isms responsible for forgetting, particularly 
s related to traditional concepts such as fad- 
if memory traces and interference (Brown, 
958; Murdock, 1964; Sperling, 1963). As 


1 This research was carried out in the Laboratory 
' Aviation Psychology and was supported by the 
ir Force Systems Command, Rome Air Develop- 
ent Center, Griffiss Air Force Base, New York, 
ader Contract No. AF 30(602)-3066. Permission is 
anted for reproduction, translation, publication, 
se, and disposal in whole or in part for any purpose 
' the United States Government. 


Postman (1964) has pointed out, however, 
methodological problems seriously limit con- 
clusions which may be drawn from much of 
this research: at present there even appears 
to be little hope of determining whether one 
or two basic processes are involved. In addi- 
tion, it is doubtful that the variables explored 
in the classical situation are the ones of great- 
est importance for many applied storage prob- 
lems. A realization of this point has led to 
the investigation of more complex task situ- 
ations which, in turn, has drawn attention to 
a number of new variables. Miller (1956), for 
example, has suggested that storage capacity 
is a function of variables which promote or- 
ganization of stimuli for recoding (that is, 
“chunking”); Murdock (1961), Schaub and 
Lindley (1964), and Lindley (1963) have 
provided empirical support for this notion. 
Other investigators have demonstrated the 
importance of variables such as _ context 
(Mackworth & Mackworth, 1959), average 
storage load (Lloyd, Reid, & Feallock, 1960; 
Yntema & Mueser, 1960) and average load 
reduction (Reid, Lloyd, Brackett, & Hawkins, 
1961) upon short-term retention in the com- 
plex continuous-memory situation. Finally, a 
series of studies by Teichner (Teichner, 1963; 
Teichner, Reilly & Sadler, 1961; Teichner 
& Sadler, 1962) has shown that number of 
stimulus categories, exposure time, and kind 
of response required all play an important 
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TABLE 1 


A REPRESENTATION OF THE CONDITIONS DIFFERENTIAT- 
ING THE FouR GROUPS 











Response 

Group? _ Display set Response 

I Tabular None Tabular 
Tabular None Spatial 

Il Tabular Tabular Tabular 
Tabular Spatial Spatial 

Il Spatial None Tabular 
Spatial None Spatial 

IV Spatial Tabular Tabular 
Spatial Spatial Spatial 





a All groups performed under four levels of stimulus informa- 
tion (see text). 


role in retention scores for a complex spatial 
display. 

The present study was also concerned with 
short-term retention in a complex display situ- 
ation. Its basic purpose was to determine the 
influence of stimulus format, response format, 
and response set upon the storage of spatial 
information presented in varying amounts. 
Both the stimulus and response formats used 
were of two kinds: spatial, wherein locations 
were represented directly in two-dimensional 
space (as in a map); and tabular, wherein 
the same locations were coded in alpha- 
numeric terms and listed in columns (as in 
a map index). The tabular-spatial distinction 
would appear to be a rather basic one from the 
standpoint of operations required on the same 
information; it also is representative of two 
very common coding arrangements encoun- 
tered in present-day information-processing 
systems. 

There were four specific questions to which 
answers were sought in the present experi- 
ment. First, does amount of retention change 
differentially for information displayed spa- 
tially and tabularly as the number of locations 
to be stored is increased (the lowest level 
was chosen so that both formats would pro- 
duce near-perfect retention)? Second, does 
the manner in which the subject must respond 
have any effect upon the measured amount of 
retention? Third, how does recoding from a 
tabular display to a spatial response (or vice 
versa) compare with the more compatible 


tabular-tabular or spatial-spatial S-R arrange- 
ments? To date, research on S-R compati- 
bility has been limited primarily to reaction 
time, dial reading, and the like; implications 
for retention have largely been ignored. Fi- 
nally, how does retention compare under con- 
ditions of spatial set, tabular set, and no set 
at all? The four variables (display format, 
response format, response set, and number of 
stimuli) were explored in factorial combina- 
tion using correct locations, errors of omis- 
sion, and errors of commission in immediate 
recall as indices of retention. 


METHOD 


Subjects and design. The 40 subjects were under- 
graduate student volunteers paid at the rate of 
$1.25 per hour for participation. They were assignec 
randomly to four experimental groups (10 subjects 
per group) each of which served under separate 
pairs of experimental conditions as illustrated ir 
Table 1. In addition, all Ss served under all four 
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Fic. 1. An illustration of the spatial display (A) 
and the tabular display (B); the information is 
identical in both displays—12 relevant (triangles) 
and 8 irrelevant events. 
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vels of stimulus information. The eight conditions 
erformed by each S were replicated 15 times and 
rder was controlled in the following manner. Each 
ay an equal number of Ss was assigned to each 
‘oup and performed three replications of all eight 
ynditions in the same randomized order; a new 
indom order for these 24 conditions was introduced 
ich successive day for 5 days. Different stimuli were 
sed for each of the 120 presentations. 

Apparatus. The S was seated at a table in one of 
wo adjacent enclosed booths; in the other were 
cated a 35-millimeter slide projector, two interval 
mers, a library of stimulus slides, and the ex- 
erimenter’s (Z’s) record sheets. Stimuli were back- 
rojected on a ground-glass screen which was 
iounted over an opening in the partition between 
ooths. Through another opening it was possible 
) observe the S’s behavior from the £’s booth 
ithout being seen. The S’s booth was illuminated 
\directly at a low-intensity level so as to achieve a 
sasonable balance between good contrast and time 
squired for visual adaptation. 

Each projected slide contained geometric stimuli 
isplayed in either a spatial or a tabular format. 
he spatial display was an 8 X 8 matrix with letters 
nd numbers denoting the columns and rows. Six 
asses of stimuli (circles, triangles, squares, hex- 
yons, ellipses, and boats) appeared in various of 
ve cells as illustrated in Figure 1a. The tabular dis- 
lay contained the same stimuli, but arranged in 
rderly columns with adjacent letter-number desig- 
ations (Figure 1b). Designations in the tabular 
isplay referred to cell locations in the spatial display. 
hus, in Figure 1 a hexagon occupies a cell location 
G-1) in the upper-right portion of the spatial dis- 
lay; its position in the tabular display (upper-left) 
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has no task significance, and its location is still 
designated G-1. Positions in the tabular display were 
assigned so that, insofar as possible, spatial con- 
tiguity among stimuli was maintained; thus, A 
locations were more likely to occur early in the 
table than were B locations. 

The number of stimuli in any slide varied ac- 
cording to experimental requirements. In all cases 
there were 10 stimuli for which no response was 
required (irrelevant stimuli), and these were drawn 
with equal frequency from the five irrelevant 
stimulus classes. In addition, there were 4, 8, 12, 
or 16 stimuli from the remaining (relevant) stimulus 
class. All stimuli were assigned to locations randomly, 
and each stimulus class was relevant for an equal 
number of slides. 

To indicate responses, the S was provided with a 
pencil and two stacks of response forms, one cor- 
responding to the spatial and one to the tabular dis- 
play format. The spatial response form was identical 
to Figure la except, of course, for the absence of 
stimuli; the tabular response form was similar to 
Figure 1b except that all locations from A-1 to H-8 
were listed in order with blanks placed adjacent to 
each. These formats, together with a warning light 
and an end-response button, were all mounted on 
the S’s table. 

Procedure. The task required the S to observe 
freely one of the projected displays for a 16-second 
inspection period. Directly thereafter, he attempted 
to indicate the location of all stimuli in the relevant 
class by checking cells in the spatial response form or 
blanks in the tabular response form. As illustrated 
in Table.1, two groups (I, I) observed only tabular 
displays, and two (III, IV) only spatial displays; 
all, however, were required to use both response 
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1G. 2. Correct response functions plotted separately for the spatial response format (A) and the tabular 
response format (B). 
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Fic. 3. Misplacement data plotted separately for the spatial response format (A) and the tabular response 
format (B). 


forms on various trials. For those groups in which a 
specific response set was established (II, IV), the 
S was informed of the relevant stimulus class and 
the required response (tabular or spatial) prior to 
each trial. For the no-set groups (I, III), only the 
relevant stimulus class was specified prior to each 
trial, and the response designation was given verbally 
with the removal of the display. Other instructions 
directed the S to take cognizance of the warning light 
(which preceded each trial by about 2 seconds) and 
to press an end-response button when he had finished 
all locating responses. Accuracy was encouraged with 
the request “try to refrain from completely wild 
guesses, but don’t be afraid to check spaces that 
you think contained relevant stimuli.” Feedback was 
provided at the end of each 24 trials (daily session) 
and included total frequencies for both correct 
responses and errors. 

The primary performance indices computed for 
each trial were number of correct locations reported, 
number of misplacements or errors of commission, 
and number of omissions. Actually, omissions were 
computed in two different ways: by subtracting the 
number of correct responses from the number of 
stimuli presented, and by subtracting the total num- 
ber of responses from number of stimuli presented. 
The latter is a conservative estimate since it reflects 
only those occasions upon which the S was fairly 
certain of his forgetting and hence failed to respond; 
the former is a liberal estimate since it includes 
occasions upon which the S$ was sufficiently con- 
fident to respond, but in which some degree of 
forgetting had occurred making that response wrong. 
In addition to these primary indices, one secondary 
measure, response latency, was also recorded (sec- 
ondary because specific instructions regarding speed 
were not included). 


RESULTS 


Correct response (location) frequencies are 
plotted as a function of the number of rele- 


vant stimuli for all conditions in Figure 2. 
It is apparent that the spatial display yields 
a greater increase in correct responses with 
number of stimuli presented than does the 
tabular display. At the 4-stimulus level, per- 
formance is comparable for both displays, but 
at the 16-stimulus level the difference in- 
creases to over 2 correct responses per display. 
The analysis of variance indicated that the 
display effect, F (1, 36) = 80.14, the stimulus 
level effect, F (3, 108) = 128.76, and their 
interaction, F (3, 108) = 37.90, were all 
highly significant (p < .01).? 

Comparison of Figures 2a and 2b reveals 
a superiority for the spatial response format 
which also appears to increase over number 
of stimuli, but to a much lesser degree than 
for the spatial display. The response format 
effect is significant at p< .01, F (1, 36) 
= 18.36, but the interaction with information 
level is not (p > .05). A significant (p < .05) 
Response X Display interaction, F (1, 36) 
= 5.42, reflects the fact that the superiority 
of the spatial display is accentuated when the 
response is highly compatible. Response set 
appears to enhance performance under all 
conditions rather than merely under the spa- 
tial display as originally anticipated. A sig- 
nificant (p< .05) F obtained for the main 
set effect, F (1, 36) = 6.18, but not for any 
of its interactions, bears out this conclusion. 


2 The transformation X’= \/X — \/X — 1 was ap- 
plied to all frequency scores in order to achieve 
homogeneity of variance. 
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Turning to the error data, it will be re- 
called that errors of commission (misplace- 
ments) and of omission (conservative and 
liberal indices) were considered separately. 
This was necessary because the S$ was not 
required to give a fixed number of responses 
for any stimulus. Figure 3 shows the average 
number of misplacements for all conditions. 
The only significant effects involve number of 
stimuli: the main effect, F (3, 108) = 58.39; 
the interaction with display format, F (3, 
108) = 3.94; and the interaction with display 
format and response format, F (3, 108) 
= 2.95. Thus, as would be expected, mis- 
placements become more frequent as the num- 
ber of stimuli increases. The trends, however, 
are generally different for the two displays: 
for the tabular display an asymptote of 
1.5—2.0 misplacements is approached after the 
8-stimulus level, whereas for the spatial dis- 
play the increase continues up to, and prob- 
ably well beyond, the 16-stimulus level. This 
trend difference is most pronounced for the 
spatial response (Figure 3a). 

In Figure 4 are presented the data for the 
conservative estimates of omission frequency. 
The liberal estimates are not plotted because, 
except for higher overall frequencies, they 
were strictly comparable to the more realistic 
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conservative data: both the patterns of scores 
and of statistical significance levels were es- 
sentially the same. As would be expected for 
a limited human storage capacity, omissions 
occur with increasing frequency as more 
stimuli are presented. In addition, there are 
consistently more omissions for the tabular 
display, and this difference increases with 
number of stimuli presented. Comparison of 
Figures 4a and 4b shows that the response 
format also exerts some influence: again, more 
omissions occur when the tabular format is 
used. All of these effects are supported by 
highly significant (p< .001) F ratios. In 
addition, there is a marginally significant 
(p < .05) Display X Response interaction, F 
(1, 36) = 3.95, which reflects a greater differ- 
ence between tabular and spatial display for 
the spatial response format. 

Comparing the data for the major indices, 
it appears that tabular displays result in 
fewer correct responses and more errors of 
omission than do spatial displays; further- 
more the difference increases with stimulus 
information and, in general, with the spatial 
response format. Errors of misplacement, 
however, increase more rapidly from 8 to 16 
as the number of stimuli is raised, and again 
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Fic. 4. Omission data plotted separately for the spatial response format (A) and the tabular response 
format (B). 
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Fic. 5. Latency data plotted separately for the spatial response format (A) and the tabular re- 
sponse format (B). 


the effect is greatest for the spatial response 
format. 

The results for the average response latency 
measure appear in Figure 5. Generally, the 
times for the spatial response format (Figure 
5a) are somewhat shorter than for the tabular 
format (Figure 5b). This effect is significant 
at p< .01, F (1, 36)= 65.40. The spatial 
display also leads to superior overall per- 
formance, F (1, 36)= 9.78, but a curious 
exception occurs when the tabular response 
is involved: when no set is provided, the spa- 
tial diplay is actually inferior to all other con- 
ditions (see Figure 5b). No systematic change 
in latency can be attributed to number of 
stimuli presented, although times are slightly 
higher for 8 stimuli than for the other levels. 
A significant interaction occurs between dis- 
play and response formats, F (1, 36) = 22.11, 
as shown clearly in Figure 5: the difference 
between spatial and tabular displays is far 
more pronounced for the spatial (Figure 5a) 
than for the tabular (Figure 5b) response. 


DISCUSSION 


Most noteworthy among the findings is the 
differential effect of input load upon storage 
for spatial and tabular displays. It is quite 
apparent that the spatial format produces 
greater retention and that the superiority in- 
creases with number of stimuli presented. 


There appear to be at least two plausible ex- 
planations for this, and each deserves further 
investigation. The first assumes that two dis- 
tinct processes are involved in short-term 
memory: one a rapidly fading perceptual 
“trace” with high initial capacity, and one 
a more permanent associative process with a 
lower capacity (see, for example, Sperling, 
1963). Given these characteristics, one would 
expect a spatial format to emphasize the for- 
mer (perceptual) mechanism and a tabular 
format, the latter (associative) mechanism. 
Owing to the hypothesized difference in ca- 
pacity of the two mechanisms, one would pre- 
dict an increasing superiority in recall for the 
spatial display as the number of items to be 
stored is increased. However, owing to the 
greater permanence of the associative mecha- 
nism, those tabular items which are stored 
should exhibit greater accuracy in recall than 
those stored spatially. The present functions 
for correct response frequency and misplace- 
ment frequency, of course, support these hy- 
potheses (i.e., more items are recalled cor- 
rectly, with fewer omissions, but also with 
more misplacements when the spatial format 
is used). 

The second explanation is in terms of the 
degree to which “chunking” is encouraged by 
the two formats. It is probable that a spatial 
display lends itself more readily to organiza- 
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ion of content for recoding than does a tabu- 
ar display. Consider, for example, the tri- 
ingles which constitute the relevant stimuli 
n Figure 1. The grouping apparent in the 
spatial display provides a handy means for 
-educing the actual number of items which 
must be stored; each group may be stored as 
1 pattern and a location reference. For in- 
stance, the five adjacent triangles in the lower- 
left portion of the spatial display can be re- 
Juced to a single pattern anchored at C-6. 
Chunking in the tabular display is restricted 
to locations with the same letter or number 
designation, and even here the reduction in 
items to be stored is relatively small: D-1, 
D-3, D-4, D-6, and D-7, for example, can be 
reduced only to D(1, 3, 4, 6, 7). Furthermore, 
what grouping can be achieved in this man- 
ner is not directly obvious and requires a cer- 
tain amount of initial search. 

The difference in propensity for chunking 
between the two formats would again account 
for the present results. As more stimuli are 
presented, the spatial display should gain in 
superiority owing to the fact that greater den- 
sity should foster more grouping. Misplace- 
ment frequency would be expected to be 
greater for the spatial display because of pro- 
gressing distortion in the stored patterns with 
forgetting (such distortion, of course, is a 
well-documented phenomenon —cf. Riley, 
1963). It should perhaps be noted in passing 
that the alternative explanations presented 
above are not mutually exclusive: if chunking 
of spatial stimuli is regarded as a perceptual 
process, storage of the “chunk” (pattern) 
can easily be envisioned as a persisting trace. 

A finding of some practical significance 
concerns the relative effect on retention of 
the response format. Although far less dra- 
matic than the display effect, spatial respond- 
ing is clearly superior to tabular responding, 
particularly as the number of stimuli pre- 
sented is increased. One explanation for this 
effect may be found in the response time data 
which show average latencies to be greater 
for the tabular response. Longer latencies, of 
course, are indicative of longer retention inter- 
vals which could only be expected to yield 
greater amounts of forgetting. 

It is surprising to note that recoding from 
spatial display to tabular response or vice 


versa produces relatively little decrement in 
retention. Certainly what effect does occur is 
so small that it is completely masked by the 
display and response variables. Were there a 
marked compatibility effect, one would cer- 
tainly not expect the superiority of tabular- 
spatial over tabular-tabular S-R pairings that 
appears in Figure 2; similarly, one would ex- 
pect a much greater decrement from spatial- 
spatial to spatial-tabular pairings than actu- 
ally occurs. The conclusion which must be 
drawn is that the manner in which informa- 
tion enters and is retrieved from storage is 
more important to retention than the amount 
of recoding involved per se. 

The influence of response set upon retention 
is also somewhat surprising. A spatial display 
should provide far greater opportunity than 
a tabular display for the utilization of set; 
this is because a spatial display permits the 
subject to emphasize either the spatial ar- 
rangement (patterning) of the stimuli or their 
alpha-numeric designation according to the 
set preparatory to storing them. A tabular 
display, on the other hand, restricts him to 
the use of alpha-numerics no matter what the 
set may be (any spatial patterning is com- 
pletely irrelevant). Apparently, however, he 
is able to gain some benefit from knowledge 
of the pending response format even in the 
tabular case for which spatial cues are ir- 
relevant. This finding raises a question as to 
exactly what process is enhanced by set. 
Clearly, it does not appear to be a matter of 
choosing one of two distinct ways of storing 
this information (such as an associative proc- 
ess versus a perceptual process), or else the 
obtained set effect would have favored the 
spatial. display. 

It is the present contention that set aids 
the initial storage process by selectively call- 
ing attention to potential chunking cues. Con- 
sider first the tabular-spatial condition: know- 
ing that a spatial response will be required, 
the S may seek out chunking combinations 
of stimuli which will occupy related posi- 
tions in the matrix. This could include stimuli 
with the same letter or the same number des- 
ignation. Next, consider the tabular-tabular 
condition: knowing that locations are listed 
in alphabetical order on the response form, 
he would probably restrict his chunking to 
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stimuli with like alphabetical designations 
(e.g., all As). He might, in fact, even attempt 
to chunk in strict alphabetical order (e.g., all 
As in order, followed by all Bs in order, etc.) 
which, under certain conditions, would even 
permit him to ignore some of the letters [e.g., 
A(1, 5, 7, 8) B(2, 4, 5) could be chunked as 
(1, 5, 7, 8; 2, 4, 5) with A and B implicit 
in the order]. A similar explanation appears 
reasonable for the spatial display results: here 
also it is proposed that the S uses set to direct 
his chunking behavior prior to storage. A spa- 
tial set, for example, may prompt him to 
chunk clusters or patterns such as the B-5, 
C-7 group in Figure 1, whereas a tabular set 
may lead him to chunk the five triangles in 
Column D. 

From a strictly applied standpoint, the 
present results recommend spatiality as an 
efficient means of coding information for 
short-term retention. More important, how- 
ever, they suggest possible factors underlying 
this efficiency which, if fully understood, 
could be of considerable significance for dis- 
play technology. Certainly, much remains to 
be learned concerning the relationships be- 
tween display variables, response require- 
ments, and specific modes of information 
chunking. In addition, the process of percep- 
tual storage—if such exists—warrants further 
direct attention. 
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INFLUENCE OF DEPTH ON THE MANUAL 
DEXTERITY OF FREE DIVERS: 


A COMPARISON BETWEEN OPEN SEA AND PRESSURE 
CHAMBER TESTING 


A. D. BADDELEY 1 
Medical Research Council Applied Psychology Research Unit, Cambridge, England 


Using a compression chamber, Kiessling and Maag (1962) showed a decline 
in manual dexterity at a pressure simulating 100 ft. of water. Impairment was 
slight (7.9%) and was assumed to be of little practical importance. The present 
study examines this conclusion by testing divers in the water. The manual 
dexterity and tactile sensitivity of 12 free divers were tested above the surface, 
and at 10 and 100 ft. below the surface. The dexterity test took 28% longer 
at 10 ft. and 49% longer at 100 ft. than on the surface, the differences between 
all conditions being significant (p< .005). Tactile sensitivity did not change. 
Replication in a dry pressure chamber showed an impairment of less than 6%, 
which though reliable (p < .05) was significantly smaller than that shown in 
the open sea (p < .05). Conclusions are (a) the impairment of manual dexterity 
at depth is considerable when tested under water, (b) it is unwise to generalize 


from pressure chamber experiments to under water performance. 


The growing military and commercial im- 
portance of the self-contained or free diver is 
focussing attention not only on the classical 
problems of survival underwater, but also on 
the less dramatic, but in the long run equally 
important question of the limits of perform- 
ance underwater. Most diving of commercial 
and military importance is carried out at rela- 
tively shallow depths where, given reasonable 
precautions, the problem of survival is not 
very great, but where the diver’s efficiency 
may well be impaired. In such circumstances 
it may be costly and even dangerous to expect 
too much of a diver. The present study at- 
tempts to take one simple but important ca- 
pacity, manual dexterity, and see how this 
is affected by depth. 

There is a considerable body of evidence 
that human performance is impaired under 
pressure. At depths of 100 feet and probably 
less, a diver begins to suffer from “nitrogen 
narcosis” or in Cousteau’s phrase, from “the 
rapture of the depths” (Barnard, Hemple- 
man, & Trotter, 1962; Cousteau, 1953; Kies- 
sling & Maag, 1962). However, in most of 
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this work the prime interest was physiological, 
and in much of the earlier work the psycho- 
logical studies were not adequately controlled. 
Probably the only reasonably detailed study 
of manual dexterity at pressure is that of 
Kiessling and Maag (1962), who found a 
significant impairment in speed of perform- 
ance on the Purdue Pegboard, a task which 
involves placing pegs in holes and mounting 
a metal collar and washer on each peg. While 
the subjects were consistently slower, the 
amount of impairment was small (7.90%) in 
comparison to the other two tasks they 
studied, namely, reaction time (20.85%) and 
a conceptual reasoning test (33.46%). On 
the basis of this they conclude that “If the 
individual merely has to perform a simple 
manual task, the pressure level may be quite 
high without severe impairment [p. 94].” 
However this, like most other experiments on 
nitrogen narcosis, was performed in a pres- 
sure chamber and not in the water. From the 
point of view of ease of administration and 
control of extraneous variables a pressure 
chamber has enormous advantages, but it does 
raise the question of how validly such results 
can be generalized to the actual diving situa- 
tion. 

When a diver enters the water, he is im- 
mediately faced with a number of additional 
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limitations and stresses. His equipment may 
hamper movement, his vision is likely to be 
restricted by the refractive properties of his 
face mask (Barnard, 1961), and in addition 
he must cope with his relative weightlessness. 
Combined with this, he is more likely to be 
influenced by such stresses as cold, isolation, 
and anxiety about his safety. It seems possi- 
ble that such stresses may interact with the 
effect of nitrogen narcosis to produce either 
increased or decreased impairment (Broad- 
bent, 1963). In view of this, it seems advis- 
able that results obtained in a pressure cham- 
ber should be validated under water before 
they are applied to practical diving. The fol- 
lowing experiment studies the manual dex- 
terity of divers above the surface, at 5-12 
feet below the surface, and at a depth of 100 
feet. It further attempts to observe tactile 
sensitivity in the three conditions so as to 
obtain some indication of whether any im- 
pairment is due to increased finger numbness, 
or is of more central origin. 


EXPERIMENT I 
Method 


Materials—manual dexterity. The basic apparatus 
was a screwplate comprising a 6 X 12-inch plate 
of vs-inch brass. It had 322-inch holes regularly 
arranged in two 4-inch squares. One side of the 
plate was painted white, the two groups of holes 
being separated by a #-inch strip of unpainted 
metal. Each hole in the left-hand group contained 
a 4-inch cheese-head 2 BA brass screw backed by 
a hexagonal brass nut. The S was required to transfer 
the 16 nuts and bolts from one set of holes to the 
other as rapidly as possible using his fingers, and was 
scored in terms of time taken and number of loose 
nuts, that is, nuts which could be tightened a 
quarter turn or more. 

Tactile sensitivity. This was measured using a 
modified version of the V test which was found by 


TABLE 1 
SPEED AND ACCURACY ON THE SCREWPLATE TEST 


AS A FUNCTION OF DEPTH 


Depth in feet 





0 10 100 
M time 184 Siero tee 27 O10) 
SD 29.39 47.30 69.00 
Total nuts loose 9 18 19 
Range 0-3 0-5 0-5 


Mackworth (1955, 1956) to be sensitive to finger 
numbness induced by cold. The present test con- 
sisted of two 12-inch perspex rulers bolted together 
in the middle and at one end, and separated at the 
other end by a 34-inch block of tufnol. The S was 
required to run the index finger of his preferred 
hand along the two edges till he reached the point 
at which the gap between them was just dis- 
criminable. 

Design. Performance was studied above the sur- 
face, and underwater at depths of 5-12 feet and 
100 feet. There were 12 Ss, 11 army divers of the 
Royal Engineers and 1 amateur diver. Each S 
carried out all three conditions, 2 Ss being allocated 
to each of the six possible orders of presentation. 

Procedure. In all conditions, S carried out both 
the screwplate and the V test. During the V test 
S averted his face and his finger was placed on the 
ruler by £. In each condition, six readings were 
taken, three with S starting at a point before the 
two edges diverged and three starting well above 
the point of divergence, with S always moving 
towards the point of divergence. The order in which 
these two blocks of readings were taken was varied 
at random and the actual distance of the starting 
point from the point of convergence was also varied. 
Half of the Ss began with the screwplate test, while 
the remainder began with the V test. All Ss per- 
formed both tests seated on a low canvas chair. 
In the two underwater conditions the chair was on 
the seabed and a belt carrying approximately 30 
pounds of lead weights was laid across S’s lap to 
increase his stability. 

The Ss were timed with a stopwatch on the sur- 
face in all conditions. In the 10-feet condition S 
was observed by a surface swimmer (&*) using a 
face mask and snorkel, who signaled the beginning 
and end of the screwplate test to an E on the 
surface (Z”). In the 100-feet condition, E* timed S 
using the second hand of a pressurized diving watch, 
and also used a pre-arranged code of pulls on his 
lifeline to signal the beginning and end of a run 
to E? who timed the run by stopwatch. £’ noted 
the results of both the screwplate and V tests on a 
formica board using a soft pencil. All Ss tested at 
100 feet were first allowed 5 minutes to acclimatize 
to the narcotic effect of CO. which has been shown 
by Rashbass (1955) to influence performance for 
only the first few minutes at pressure. In all con- 
ditions, Ss were given their score on the screwplate 
test immediately. 

All the tests were carried out in Famagusta Bay, 
Cyprus, the deep water tests from the deck of a 
Royal Engineers Z craft in calm, fine August 
weather. Underwater visibility was relatively good 
(80-90 feet), giving ample illumination in all con- 
ditions (approximately 300 foot-candles). 


Results 


Manual dexterity. (a) Speed—Table 1 
shows the mean time to complete the screw- 
plate test at each depth, and the total num- 
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ber of loose nuts. In the 100-feet condition 
there was good agreement between timing on 
the surface and timing at the bottom, except 
for a relatively constant lag of 2 to 3 seconds 
presumably due to the signaling system. The 
bottom times, which were shorter, were used 
for the analysis. 

All 12 Ss took longer at 10 feet than on 
the surface and 11 of the 12 took longer at 
100 feet than at 10 feet. When conditions 
were compared using a Wilcoxon test, both 
these differences proved highly significant (p 
< .005 two-tailed). (6) Accuracy—It can also 
be seen from Table 1 that Ss left relatively 
few nuts loose in all conditions (8.1%), but 
that there is a tendency for the number loose 
to increase with depth. A more detailed analy- 
sis of the data using Jonckheere’s distribution- 
free concordance test against ordered alterna- 
tives (Jonckheere, 1954) supports this con- 
clusion (p < .01, two-tailed). 

None of the comparisons between indi- 
vidual conditions was significant, probably 
because so few loose nuts occurred, leading 
to a large number of zeros and ties. Of the 
12 Ss tested however, 5 were more accurate 
on the surface than at 10 feet and none was 
less accurate, and 7 were more accurate at 
10 feet than at 100 feet while 3 were less ac- 
curate. This implies a systematic decrease in 
accuracy with depth rather than a simple 
difference between performance above and 
below the surface as Table 1 might suggest. 

(c) Efficiency and Experience—The Ss 
differed widely in amount of diving experi- 
ence, ranging from 7 years to less than a week, 
with a median of about 18 months. While 
there was a significant positive correlation 
between speed at the screwplate test on the 
surface and length of diving experience (Tau 
= +.50, p< .025 two-tailed), there was no 
relationship between experience and perform- 
ance either at 10 feet (Tau = +.165, p> 
1) or at 100 feet (Tau = +.21, p> .1). 

(d) Practice Effects—Improvements in per- 
formance over the three successive tests were 
relatively small, and were not statistically 
significant. 

Tactile sensitivity. The V test was scored 
in terms of the size of the smallest just-dis- 
criminable gap between the two edges of the 
rulers. This was recorded in terms of the dis- 


TABLE 2 


MEDIAN AND RANGE OF JusT—DiIscCRIMINABLE GAP 
(MILLIMETERS) ON THE V Tesv As A FUNCTION OF 
DEPTH 








Depth in feet 


0 10 100 
Ascending runs def 1.8 1.85 
(.7-3.5) (.9-3.4) (.7-3.5) 
Descending runs 9 : 8 
(.7-1.3) (.5-1.4) (.5-1.5) 
M 1.00 1.30 1.325 





tance in centimeters from the closed end of 
the rulers and was later transformed into size 
of gap in millimeters. The median just dis- 
criminable gap for ascending and descending 
runs in each condition is shown in Table 2. 
The most striking feature of this table is the 
tendency for all 12 Ss to select a smaller gap 
when starting at the wide end of the V. There 
is, however, no significant change in tactile 
sensitivity with depth. While there appears to 
be a tendency for the ascending threshold to 
increase with depth, this is not significant on 
Jonckheere’s test (1954) and does not occur 
with the descending threshold. A comparison 
between tactile thresholds taken before the 
screwplate test and those obtained after 
showed no difference. Within the limits of 
this relatively crude test, it seems unlikely 
then that the impairment in manual dexterity 
at depth was due to finger numbness. 


Discussion 


These results suggest that manual dexterity 
is impaired by depth to a much greater ex- 
tent than the 7.9% shown by Kiessling and 
Maag. This implies that results obtained in a 
dry pressure chamber can not validly be 
generalized to performance under water. How- 
ever, the present experiment used a different 
test of manual dexterity and a different popu- 
lation of divers from the Kiessling and Maag 
study. Experiment I was therefore repeated 
in a dry pressure chamber. If the large effect 
of depth was indeed due to testing under 
water, then this second experiment should 
show only a small effect such as was shown 
by Kiessling and Maag. 
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Fic. 1. Time to complete the screwplate test as a 
function of depth for the pressure chamber experi- 
ment and the underwater experiment. 


EXPERIMENT II 


Procedure. The manual dexterity of 16 Army divers 
from the Royal Engineers and 2 civilian divers was 
studied using the previously described screwplate 
test. They were tested in a pressure chamber at 
pressures equivalent to 0, 10, and 100 feet of sea- 
water. Three Ss were tested in each of the six pos- 
sible orders. They were timed with a stopwatch by 
two Es, one inside the pressure chamber and one 
outside. Agreement between the two time scores was 
in all cases very close. 


Results. (a) Speed—Figure 1 shows the 
mean time to complete the screwplate test at 
each pressure. The results of Experiment I 
are included for comparison. The influence of 
pressure is clearly much less than in Experi- 
ment I, with Ss taking only 5.5% longer at 
the equivalent of 100 feet than they do on the 
surface. A Wilcoxon test showed this differ- 
ence to be significant however (p < .05, two- 
tailed), though the difference between the 10- 
feet and 100-feet conditions was not signifi- 
cant (.05 > p> .01, one-tailed). 

(6) Accuracy—This was consistently 
higher than in Experiment I; only eight loose 
nuts occurred, three each at O and 100 feet 
and two at 10 feet. 

(c) Efficiency and Experience—Of the 18 
Ss, 5 were inexperienced trainee divers. The 
performance of these Ss did not differ sub- 


stantially from the mean in any condition, 
and was in fact marginally less affected by 
pressure (3.3% as against 5.5%). 

(d) Practice Effects—A marked practice 
effect occurs between the first trial (mean 
206.7 seconds) and the two subsequent trials 
(mean 194.6 and 195.1 seconds, respectively). 
This effect is shown by all but 3 of the 18 Ss 
and is thus highly significant (p < .01 two- 
tailed, Sign Test.). The practice effect is of 
approximately the same magnitude as the 
effect of pressure, about 10 seconds, In Ex- 
periment I it was presumably masked by the 
much larger effects of depth. 

Comparison with Experiment I, (a) Speed 
—In both experiments Ss worked at approxi- 
mately the same rate on the surface, suggest- 
ing that the groups were comparable. The cru- 
cial question is whether the effect of increasing 
pressure is greater when Ss perform under 
water. This was tested by calculating a per- 
centage impairment score for each S$ using 
the formula [H — T]/S, where H = Perform- 
ance time in the 100 feet condition, T = time 
at 10 feet and S = time taken on the sur- 
face. The mean impairment score for Ss tested 
in the water was 19.8% and was significantly 
greater than the impairment shown by Ss 
tested in the pressure chamber (¢ = 2.38 p 
< .05, two-tailed), who showed a mean decre- 
ment of only 4.6%. 

(b) Accuracy—Experiment II Ss tended to 
be more accurate in all conditions, but data 
were too sparse to allow any very meaning- 
ful comparison. 


Discussion 


Experiment II shows that the influence of 
pressure on manual dexterity is much smaller 
when the experiment is performed in a dry 
pressure chamber than when the diver is 
tested under water. In Experiment II, the 
only difference between conditions was that 
due to pressure. In Experiment I, however, 
several other factors were probably operative. 

Immediately a diver enters the water, he 
is faced with several handicaps. His equip- 
ment is likely to prove slightly cumbersome, 
the tunneling of vision accompanying the 
visual magnification induced by his face mask 
may prove a handicap (Barnard, 1961), but 
the greatest difficulty is probably due to his 
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relative weightlessness. Thus, even though 
well weighed down with lead, Ss tended to 
be unstable and their gross movements to be 
slow and clumsy. The difficulties raised by 
weightlessness may be amplified in shallow 
water by turbulence on any but calm days. 
This almost certainly reduced stability and 
impaired the performance of some Ss when 
tested in the 10-feet condition. All these fac- 
tors may contribute to the impairment of 
performance at 10 feet. 

At 100 feet, although the diver is unlikely 
to be affected by turbulence, he has the addi- 
tional problem of nitrogen narcosis. If this 
were a simple effect, Experiment II would 
suggest a further impairment of about 5%. 
The fact that it is much greater implies an 
interaction between the general stress of per- 
forming a task under water and the effect of 
pressure. Whether the actual degree of ni- 
trogen narcosis is increased by the presence of 
anxiety and other stresses under water or 
whether the test just becomes more difficult 
and thus more sensitive when performed 
under water, is not at present clear. If, how- 
ever, the interaction is due to the increased 
sensitivity of the task under water, it seems 
possible that tasks may be differentially af- 
fected by the under water physical handicaps 
imposed on the diver. If so, it may be ex- 
tremely misleading to generalize from the 
relative difficulty of different tasks in a pres- 
sure chamber to their difficulty at an equiva- 
lent pressure under water. 

The practical implications of these results 


are first, that the manual dexterity of a diver 
is considerably impaired whenever he must 
work under water, and further deteriorates 
at a depth of 100 feet. The implications of 
this for the design of diving equipment and 
the assignment of jobs to the diver are obvi- 
ous. Secondly, they suggest that it is unwise 
to generalise from experiments performed in 
a pressure chamber to the actual performance 
of a diver under water. 
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A LITERACY INDEX FOR THE MAILBAG? 


STANLEY C. PLOG 2 


Depariment of Psychiatry, School of Medicine, University of California, Los Angeles 


A literacy index for predicting the educational level of the writers of public 
mail was developed on 162 letters sent to the editor of the Boston Herald. 
Accuracy of predicted educational level with actual educational level is 66.7% 
for grammar school educated, 69.2% for high school educated, and 79.8% for 
college educated. Coding instructions and directions for scoring are included 


in the text. 


Americans are inveterate letter writers. In 
addition to the usual correspondence between 
relatives and friends, millions of letters are 
sent annually to business and industry to com- 
plain about quality of service or compliment 
a particular company or person for its high 
standards. When fan mail to professional en- 
tertainers is also counted, unofficial estimates 
place this total at several billion pieces of mail 
a year (Newsweek, October 5, 1959, p. 70). 
Though many organizations have reported 
that they have changed major segments of 
their services or products because of customer 
reaction expressed in letters, most of this 
mail remains relatively unanalyzed, except 
for the usual “‘pro-con” tabulations. The rea- 
sons for this center around the fact that only 
a few techniques of content analysis are 
available which are both systematic in their 
application and easy to use, and most busi- 
hess firms cannot afford or cannot find con- 
sultants to provide more elaborate and de- 
tailed analyses. If the wealth of information 
contained in these letters is to be made avail- 
able to more persons, it is clear that simpler 
techniques must be developed which can be 
applied by nonprofessional and clerical per- 
sons. 

One of the central problems in analyzing 
most public mail revolves around the fact that 
it is usually impossible to obtain socioeco- 


1 Appreciation is due to the staff of the Boston 
Herald who cooperated in this study, especially 
Alden Hoag, Chief Editorial Writer, and his secretary, 
Peggy Brown, who went to considerable effort to 
make certain no letters were lost. The project was 
supported by a small grant from the Department 
of Social Relations, Harvard University. 

2 Now at the Urban Observatory, Institute of Gov- 
ernment and Public Affairs, University of California, 
Los Angeles. 
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nomic or demographic data about the letter 
writers. Unless the person happens to mention 
his educational background, occupation, or 
income level, there has been no acceptable 
way of deriving useful estimates of such socio- 
economic facts. Several indices for rating 
educational level of letter writers were devel- 
oped more than 25 years ago (Sayre, 1939; 
Wyant, 1941; Wyant & Herzog, 1941), but 
they involved complicated coding systems 
which would prohibit their use with large 
volumes of mail and none made use of con- 
temporary statistical techniques which could 
increase their predictive power. 

During the course of research on political 
mail by this author, it became necessary to 
develop an index which could predict to the 
literacy or educational level of letter writers 
(Plog, 1961). The index would have to be 
simple to use and relatively easy to achieve 
satisfactory coding reliabilities since it would 
be applied to large volumes of mail and vari- 
ous persons would be involved in coding. If it 
proved effective, it could: also be used in re- 
search on “public mail” sent to a variety of 
business organizations, professional enter- 
tainers, or local and national politicians. 

In determining what would constitute an 
adequate sample for such a literacy index on 
public mail, two characteristics appear to be 
important. First, “public” letter writing rep- 
resents an act of spontaneity by the letter 
writer, and second, the letter is addressed to 
someone whom the letter writer considers to 
be a person of importance. The qualification 
of spontaneity suggests that the individual is 
highly motivated about an event of the day 
and he composes his letter at the height of 
his motivation. Thus, not a great deal of time 
and energy has gone into making certain that 
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his thoughts are expressed clearly and clev- 
erly. However, sending his communication to 
someone of importance does suggest he will be 
on the best letter writing behavior that his 
current motivational state will allow and he 
will tend to use the finest quality of paper at 
his immediate disposal, employ his best writ- 
ing instrument (typewriter, if available), and 
be somewhat restrained in his choice of words 
and expression of ideas. Letters to intimate 
friends or relatives would not meet these 
requirements for one often uses cheap sta- 
tionery and little care is taken about syntax 
or grammatical style. If these two assump- 
tions are accepted as important in describing 
most public mail, then it is possible to utilize 
many kinds of research samples in develop- 
ing the index. 

The Boston Herald cooperated on this 
project and questionnaires were mailed to all 
persons who wrote to the editor during two 
time samples in 1959. A 10-day period in 
mid-August produced 100 letters and another 
62 were received during a 1-week period in 
the late fall. The questionnaire requested 
information about newspaper reading habits, 
socioeconomic status, and educational back- 
ground. Excluding those letters from final 
tabulations which were returned by the Post 
Office because of an insufficient return ad- 
dress, the first sample produced a 92% return 
and the second did even better with nearly 
98% completions. This extremely high de- 
gree of cooperation by the participants insures 
the representativeness of the sample (Plog, 
1963).8 

RESULTS 


Of the 151 questionnaires returned by The 
Herald readers, 133 were used in the final 
analysis (18 letters were excluded because the 


3 Additional samples for cross-validation purposes 
were collected in other settings, but a lesser per- 
centage of return by the respondents does not 
insure a representative sample of the populations 
studied. A request for letters from 120 outpatients 
at the Neurology Clinic, UCLA Medical Center and 
the Department of Psychiatry, Harbor General 
Hospital, Los Angeles resulted in only 37 returns. 
Percentage correct predictions are 75.0% for college 
educated (12 letters), 76.5% for high school educated 
(17 letters), and 62.5% for grammar school educated 
(8 letters). An even smaller return of mail solicited 
from an adult education class in Cambridge, Mas- 
sachusetts (11 returns out of 110 requests) was not 
analyzed. 


respondents had omitted the question on edu- 
cational background or there were gross dis- 
crepancies in the way items had been com- 
pleted). Of this total, 79 indicated they had 
attended at least some college, 39 said they 
had a high school education, and 15 stated 
they had not gone beyond the eighth grade 
(grammar school). Originally it had been 
hoped that distinctions could be made be- 
tween five educational categories (grade 
school, junior high school, high school, col- 
lege, and graduate training), but it proved 
difficult to predict to junior high school edu- 
cation and very few persons showed improve- 
ment on the literacy index beyond college 
training. The negative skew in this sample 
(high percentage of college educated) is ex- 
pected in most mass communications studies 
since those individuals who lack verbal or 
graphic fluency do not often trouble them- 
selves to communicate their thoughts on im- 
portant issues (Sussman, 1957). 

All of the letters in the final sample were 
scored on the six variables listed below, uti- 
lizing a scale of 1 to 3 or 4 for each of the 
variables. Intercoder scoring reliabilities were 
85% agreement on Graphic Maturity, 92% 
agreement on Grammar, 96% agreement on 
Spacing, and average agreement was 93%. 
And, even more important, at no time did the 
coders differ by more than one point on any 
of the six variables and in only 5% of the 
cases did coding discrepancies affect the final 
classification of the letter. 

Step-wise multiple-regression analysis pro- 
vided regression weights for each of the six 
variables, and each letter then received a to-° 
tal score on all of the variables. Cutting points 
between educational groupings were deter- 
mined visually, that is, by visually selecting 
“natural” dividing points and not by formula 
methods.t The best predictors were Graphic 


4W. J. Dixon of the Biomedical Statistics Depart- 
ment at the University of California Medical Center, 
Los Angeles, argues that visually determined cutting 
points to discriminate between groups are more 
suitable than machine formula methods because: 
(a) the number of correct predictions is maximized 
and the number of errors minimized for that group 
of data, and it would be hoped that new samples 
would be quite similar in distribution, and (b) any 
formula based on the probability of a given item 
being classified in X group does not take account of 
the probabilities operating in new and differing sets 
of data. 
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TABLE 1 


ACTUAL AND PREDICTED EDUCATIONAL LEVEL 
oF Boston HERALD LETTER—WRITERS 








Actual education 





Grammar High 

Predicted education school school College 
Grammar school 

(Grades 1-8) 10 4 0 
High school 

(Grades 9-12) 4 an 17 
College 

(1-4 years) 1 8 62 
N 15 39 79 
Percentage of correct 66.7% 69.2% 79.8% 

predictions 





Maturity, Quality of Paper, Grammar and 
Word Usage, and Neatness—in that order 
(see “Method of Scoring” for description). 
The type of Writing Instrument used and the 
completeness of Addresser-Addressee Char- 
acteristics did not contribute sufficiently to 
the total score and were excluded from the 
final analysis. 

Table 1 presents the predictive success for 
each of the educational groups. It is obvious 
both from the table and the spread of raw 
scores that there is more variation in the 
quality of letters from a grammar school edu- 
cated population of adults than from persons 
with greater amounts of schooling. The cor- 
relation between actual and predicted educa- 
tional level for all educational groups com- 
bined is .71. Considering the gross differences 
in intellectual potential of the population, the 
wide discrepancies in the quality of education 
offered in the nation’s schools, and the varia- 


bility of types of degree programs taken by 
students, it is surprising the index works as 
well as it does. 

Table 2 gives the intercorrelations between 
each of the variables used in scoring and edu- 
cational level. As would be expected, the in- 
tercorrelations between the variables are less 
than their individual correlations with the 
predictive criterion (educational level). 


METHOD OF SCORING 


Four variables are used in the final scoring 
system and a brief description of the coding 
criteria and method of utilizing the index are 
presented here. All of the variables are scored 
for each letter and the sum total of the four 
variables determines the classification of each 
letter, 


Quality of Paper (Score 1 through 4) 


1. The lowest point total is given if the individual 
uses what might be considered inappropriate kinds 
of stationery for his correspondence. These include 
scratch tablets, sales slips (which seem to be sur- 
prisingly frequent), backs of envelopes, torn slips 
of paper, etc. 

2. A step above this is when the person utilizes 
lined writing tablets, pages from school notebooks, 
or other more standard sized papers which provide 
some form of structure to aid the writer (as with 
the printed lines), but which are still in the 
category of “cheap” papers. 

3. The “smooth feel” papers are scored 3 if they 
do not contain printed lines or other aids to neat- 
ness. These papers usually have a smooth feel when 
touched, and most frequently include the mimeograph 
papers or second-quality typing paper. Also coded 
here is the inexpensive stationery in boxed or 
tablet form which often has many fine ridges 
running vertically and horizontally, but which pos- 
sesses no pattern beyond this. Not to be included 


TABLE 2 


INTERCORRELATIONS BETWEEN WEIGHTED LITERACY VARIABLES 
USED IN SCORING AND EDUCATIONAL LEVEL 











Educa- 
Graphic tional 
Paper maturity Grammar Spacing level 
Paper 39 02 42 56 
Graphic maturity 42 530) 54 
Grammar 43 55 
Spacing AT 


Educational level 








Note.—Multiple R = .70. 
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are the quality papers, which are increasing in fre- 
quency in recent years, which also have a smooth 
feel but can be easily identified by the presence of 
a watermark or bond signature. 

4. The highest point total is allowed for ex- 
pensive stationery, almost always having a linen or 
rag content and most often a watermark of some 
kind. As distinguished from cheaper grades of paper, 
this has more “body” to it and gives the feeling 
that it would wear considerably longer. In the 
past, quality paper could be distinguished most fre- 
quently by the presence of a cockled or uneven sur- 
face, but recently the trend is toward making 
quality papers with a smooth texture. The paper 
should always be held to the light to examine for 
any identifying bond marks. 


Neatness and Spacing (Score 1 through 4) 


1. A one point total is allowed for those letters 
that are either extremely messy (noticeable ink 
spots, many typing strikeovers, poor erasures, etc.), 
or which show no conception of proper spacing (no 
margins, writing is diagonal across paper, extra 
lines added vertically in margins, etc.). 

2. An extra point is allowed if the person simply 
shows poor spacing judgment in not allowing a 
margin on one side of the paper, begins his letter 
without allowing sufficient area for a letterhead or 
saluation, or crowds in additional sentences at the 
bottom of the last page (squeezes his handwriting or 
changes from double to single space on the type- 
writer), etc. 

3. In many cases, it is obvious that the writer 
did not take pains to make his letter neat nor did 
he worry about spacing characteristics. Yet, the 
result could not really be called “sloppy.” Three 
points credit are given for these letters. 

4, Maximum credit is allowed for those letters 
that are presentable under any standards, and can 
best be illustrated by the correspondence typed by 
secretaries of businessmen. Handwritten letters may 
also score well, but there must be evidence of con- 
cern over neatness. 


Grammar and Word Usage (Score 1 through 
4) 


Of all the variables included in the literacy index, 
Grammatical Style and Word Usage are perhaps the 
most difficult to explain. Yet, it is imperative that 
the coders give evidence of high reliability for 
previous studies have shown that ability to use 
complicated sentences and difficult words are con- 
sistently the best predictors to amount of educa- 
tional achievement. The attempt here has been to use 
an overall rating that includes complexity of gram- 
matical structure and appropriate use of uncommon 
words. Brief examples from political mail on the 
McCarthy-Flanders censure fight in the Senate in 
1964 will help to illustrate each category. 

1. A single point is given in those cases where 
grammatical usage is very poor and words are mis- 
spelled or are used incorrectly. Sentences often run 


together without the usual periods to separate 
thoughts, capital letters are frequently missing from 
the first words of new sentences or are placed on 
common nouns in the middle of sentences, and the 
use of violent language and swearing is more fre- 
quent. As examples, a note written with a ballpoint 
pen includes, 


. Yes I beg you stop rigt now drop the 
Censor. I see Cival War ahed. The way to hurt 
McCarthy is not to Censor him. Walk off ignore 
him let every Senator turn his back on him as 
being unfit to Asociate with. Start a whisperin 
Campain. One in which the whisper secret wont 
reach asea or Europe. where we have enemies . . . 


Or another note in ballpoint which reads in part, 


You Senitor Flanders your Must Be Slipping as 
to the Recent Remarks you have Made about the 
irish and the irish americans did you hear there 
was 600 and 1 Men in Gorge Washingtons army 
of irish Berth and you are insulting them since 
the day you took office well i Dont tolerate on 
theater 


2. An additional point is allowed for those letters 
which are noticeably improved from the examples 
above, but still give evidence of several obvious 
mistakes on each page. Many of the same kinds of 
errors can be present, but usually not as frequently. 
The choice of words and expression of thoughts 
often seem to show a striking parallel to the dialogue 
on TV Westerns and crime movies. For example, 


... 1 served my country Eleven one half years as 
a soldier. If I done What he has done, I would 
be in Jail. Get after him, and make him make 
an account of his earnings. that is one Way to 
get him. As Far as dignity or culture is concerned 
he doesnt posess that, no man can insult my 
uniform and stay on my side because he is not 
an American . 


Or, a note in ballpoint pen beginning with, 


Of all the boobs that have ever been members of 
the-U.S. Senate, I think you are the Chief boob, 
sucker, toad, etc. .. . You was never heard of in 
the senate till the Reds got you to pull their 
Chestnuts out of the fire and attack Senator 
McCarthy, one of the real americans with Guts 
and one who loves his country, his freedom and 
his heritage, but you, Flanders, didn’t have ability 
enough to even become known till they picked 
you up and Made a tool out of you for their 
rotten purpases . . 


3. A three point total is indicated for those letters 
which contain only minor errors in grammar or 
spelling. In general, the writer is able to express his 
thoughts clearly, but his letter lacks the “dressing” 
of uncommon words or inclusion of interesting 
stylistic phrases. A housewife from New York, New 
York used a fountain pen to begin her letter with, 
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Ever since I first read of your charges against 
Senator McCarthy I’ve wanted to write to con- 
gratulate you. You serve your country well in so 
doing. I have just returned from a year around 
the world spent in the orient. The harm that 
has been done to the prestige of the U.S. and the 
damage to the cause of democracy in that im- 
portant area of the world can hardly be over- 
stated ... 


A penned letter from a Vermont woman states, 
It is men like you who are diverting the attention 
of the country from genuine perils facing it. You 
are just falling into the trap set by the Dem. and 
left wingers, to divide the Rep. party. There is no 
common sense to all this furor .. . 


4. The best letters in any sample are given the 
maximum credit of four points, and reflect the 
work of highly articulate or well-educated persons. 
Thoughts are expressed clearly, usually quite briefly, 
and unusual words are included in an appropriate 
and meaningful manner. These persons possess 
facility with the written word and they probably 
carry on a large correspondence with friends and 
business associates. A typed letter from a Rochester, 
New York, man reads in full, 


The vacillation of the United States Senate in 
dealing with Senator McCarthy is discouraging 
to say the least. It would be a tradgedy if one 
demagogue were to be successful in intimidating 
the Senate. 


On the McCarthy side, an MD typed a brief note 
which includes, 


It is about time for the Senate to return to 
reason and sanity. 

Any attempt to pillory a patriot like McCarthy 
will only result in greater contempt for those who 
have instigated this travesty on justice... . 


Graphic Maturity (Score 1 through 3) 


This variable was added at the suggestion of 
Elizabeth McCarthy (handwriting consultant fre- 
quently employed by the Boston Police Department) 
and, surprisingly, proved to be one of the best 
predictors to educational level. 


TABLE 3 


REGRESSION WEIGHTS FOR LITERACY 
INDEX VARIABLES 








Variables 
Point Graphic 
total Paper Spacing Grammar maturity 
1 oa «iS 25 noe 
2 54 30 .50 64 
3 81 45 nD .96 
4 1.08 -60 1.00 —_— 


Note.—For purposes of linear regression analysis, a grammar 
school education was given an assigned value of 1, a high school 
education a value of 2, and a college education a value of 3. 


TABLE 4 


CutrinG POINT SCORES BETWEEN 
EDUCATION GROUPINGS 











Grammar school educated—2.24 or below 
High school educated—2.25-2.95 
College educated—2.96 or above 


1. According to the rationale offered by Elizabeth 
McCarthy, the task of the primary schools is to 
teach the person how to write, often by first showing 
him how to print and later how to write. In these 
years, the individual forms his letters clumsily and with 
much difficulty, being concerned with making only 
one letter at a time and not with writing a line of 
smooth-flowing words. Therefore, if the letter is 
all printed, or gives evidence that the writer experi- 
ences difficulty in manipulating a writing instru- 
ment (poorly-formed alphabet characters or lack 
of a flowing style), a minimum score of 1 is allowed. 

2. By the time of high school, the person has 
developed greater facility in writing from several 
years of corrective practice. The pressure from 
teachers to turn in term papers or answers to ex- 
aminations that are legible forces most students to 
correct many of the previous faults in handwriting 
techniques to facilitate the readability of his many 
classroom exercises. A 2-point total is allowed, then, 
in those cases where the handwriting is very legible 
and often quite similar to textbook copy. Individual 
letters are nearly perfectly formed and the script 
is easy to read. 

3. If the person continues his education and 
enters college, great pressures develop for increasing 
the speed of writing to keep up with the quick pace 
set by various lecturers. The usual result is that the 
writing shows many signs of hurriedness—it be- 
comes elongated, the “o’s” and “a’s” are not quite 
connected, the “t’s” tend to be crossed mainly on 
the right of the vertical line, and though it may 
still be described as “flowing,” it is often legible only 
to the writer. Many college students become con- 
cerned about the poor quality of their writing and 
the fact that even they have great difficulty in 
reading their own notes, and they will resort to a 
rapid form of printing again. Therefore, if the letter 
shows the above characteristics of writing or a mix- 
ture of printing and writing, it receives a full credit 
of 3. (All printing is scored 1.) 

The regression weights for each variable are 
presented in Table 3 and the cutting points separat- 
ing the educational groups in Table 4. The procedure 
is to score a letter on each of the scales from 1 to 4 
(Graphic Maturity is scored only 1 to 3), multiply 
the score on each variable times its appropriate 
regression weight, and sum the total across all of 
the variables to determine the educational classifi- 
cation of the letter. For example, if a letter had been 
scored: Paper-3, Spacing-2, Grammar-3, Graphic 
Maturity-2, it would be classified as “high school 
educated” (.81 + 30 + .75 + 64 = 2.50 = high 
school). 
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After one has worked with the index for a brief 
period of time, the scoring and tabulating process 
become very rapid, especially since it is usually not 
necessary to read more than a single page of any 
letter to be able to score each of the variables. In 
addition, the system has proved to be relatively 
easy to teach to new coders. Descriptions of the 
scoring criteria, practice coding sessions, and the 
achievement of satisfactory coding reliabilities can 
usually be accomplished within a 2-hour practice 
session. 


REFERENCES 


Newsweek, October 5, 1959. p. 70. 
Proc, S. C. Flanders vs. McCarthy: A study in the 
technique and theory of analyzing congressional 


mail, Unpublished doctoral thesis, Harvard Uni- 
versity, 1961. 

Proc, S. C. Explanations for a high return rate on 
a mail questionnaire. Public Opinion Quarterly, 
1963, 27(2), 297-298. 

Sayre, J. Progress in radio fan mail analysis. Public 
Opinion Quarterly, 1939, 3, 272-278. 

Sussman, Lema. Voices of the people. Unpublished 
doctoral thesis, Columbia University, 1957. 

Wyant, Rowena. Voting via the Senate mailbag. 
Public Opinion Quarterly, 1941, 5, 359-382. 

Wyant, Rowena, & Herzoc, Herta. Voting via the 
Senate mailbag: Part II. Public Opinion Quarterly, 
1941, 5, 590-624. 


(Received November 24, 1964) 


Journal of Applied Meyehaleey 
1966, Vol. 50, No. 1, 92- 
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ANCHORED VERSUS JOB-TASK ANCHORED 
RATING SCALES * 
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This study compared the effectiveness with which job-task anchored equal- 
appearing interval scales could be used in contrast with scales anchored only 
by simple numerical benchmarks. 2 groups of judges rated identical lists of 
job-task statements in terms of both types of scales. Ratings were made on 
5 sensory/physical dimensions of job activities. The reliabilities of ratings for 
all scales were computed by an analysis of variance approach. In a test of 
statistical significance across all 5 scale dimensions, it was found that job- 
task anchored scales could generally be used with significantly greater reli- 


ability than simple numerically anchored scales. 


In the design of psychological rating scales, 
position levels along a scale can be character- 
ized in various ways. Two conventional ap- 
proaches to the matter of identifying different 
levels have been: (a) assigning numerals to 
respective levels, and (0) placing words or 
statements at various points along the scale. 
In each case the major function of the bench- 
mark or scale-level indicator has been to 
provide raters with an appropriate frame of 
reference for judging characteristics of given 
stimuli. 

This particular study was concerned with 
two kinds of benchmark arrangements. Data 
for the study were obtained in the context of 
having judgments made about job-task state- 
ments. Ratings on the task statements were 
made in terms of selected physical and sen- 
sory dimensions, for example, finger manipu- 
lation, auditory discrimination, etc. In par- 
ticular, the study involved a comparison of 
scales that made use of verbal job-task bench- 
marks and scales using only simple numerical 
benchmarks, 


1 Prepared under auspices of Contract Nonr- 
1100(19) between the Office of Naval Research, 
Department of the Navy, Washington, D. C. and 
the Purdue Research Foundation, Lafayette, Indiana. 
Reproduction in whole or in part is permitted for 
any purpose of the United States Government. 
Appreciation is expressed to M. J. Driver, Purdue 
University, for helpful suggestions in connection with 
the preparation of the present manuscript. 

2 Now with the Personnel Research Section, E. I. 
du Pont de Nemours and Company, Wilmington, 
Delaware. 
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Other studies have been conducted in this 
general area; however, they have differed from 
the current effort either in purpose or con- 
tent dimensions investigated. In connection 
with rating-scale formats generally, Barrett, 
Taylor, Parker, and Martens (1958) studied 
four rating-scale formats which varied from 
unstructured to highly structured. Their re- 
sults indicated that the format incorporating 
trait titles and behavioral descriptions of 
scale steps was superior to “both more- and 
less-structured formats [p. 333].” The cur- 
rent study resembled the one by Barrett et 
al. in that scale format was in both cases an 
experimental variable. However, it differed in 
that the present study focused on ratings of 
job-task statements, whereas in the former 
case, rating scale structure was varied in the 
context of obtaining supervisory ratings of 
four groups of clerical workers. Mosel, Fine, 
and Boling (1960) studied the extent to 
which ‘estimated trait requirements were 
scalable in the Guttman sense. They found 
almost all of the 10 interest traits investi- 
gated had “acceptable scalabilities.” Over 
half of 13 personality requirements were 
scalable, while three of 10 aptitude require- 
ments also proved scalable. Recent work by 
Madden and Bourdon (1964) has considered 
the effects of several variations in rating scale 
format on rater judgment, While their study 
was not designed to answer the question of 
which rating scale format was “best,” results, 
among other things, did serve to show that 
differences exist with regard to judgments 
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which were “a function of the format of the 
rating scale [p. 151].” 

The present study was formulated to pro- 
vide information regarding the relative ef- 
fectiveness of equal-appearing interval scales 
which had been constructed with job-task 
statements used for scale-level benchmarks in 
contrast to scales using simple numerical 
benchmarks. Specifically, it was hypothesized 
that scales which embody job-task bench- 
marks could be used with greater reliability 
than those consisting of numerical values. 
Finally, it should be noted that the present 
research effort was part of a larger investiga- 
tion involving the development and utiliza- 
tion of job analysis formats such as The 
Worker Activity Profile (McCormick, Gor- 
don, Cunningham, & Peters, 1962) which in- 
corporate various types of scales (Peters & 
McCormick, 1962) and checklist-type items. 


METHOD 


Five dimensions of worker activity were selected 
for scaling investigation, these being finger manip- 
ulation, near visual discrimination, far visual dis- 
crimination, auditory discrimination, and general 
physical coordination. For each of the five dimen- 
sions a 7-point equal-appearing interval scale was 
developed. 


Initial Scale Development 


The development of each of these scales was ac- 
complished in the following manner: 

1. Lists of 45 or 50 job-task statements for each 
of 5 dimensions were obtained from sources such as 
the Dictionary of Occupational Titles, Volumes I 
and II; Estimates of Worker Requirements for 
4,000 Jobs, and similar publications. For example, a 
task statement for the finger manipulation dimension 
might read: “make connections in operating tele- 
phone switchboard.” An attempt was made to select 
statements which would cover a wide range of jobs 
and job activities for a given dimension. 

2. Each of the job-task statements for each was 
then rated on a 7-point numerical scale for each of 
the 5 dimensions in terms of the degree of the par- 
ticular dimension involved. In the rating procedure, 
a rating of “1” indicated the lowest degree and 2, 
3, 4, 5, 6, represented varying degrees up to the 
highest rating permitted, a “7.” Judges used for 
rating task statements were upper-level engineering 
undergraduates and first- and second-year graduate 
students enrolled in dual-level courses in industrial 
psychology at a midwestern university. The num- 
ber of judges used in these rating procedures is 
indicated in the second column of Table 1. 

3. A median value and an index of variability were 
computed for each of the job-task statements. The 


index of variability (“V”) was a refined type of 
truncated range which closely approximated the 
interquartile range or Q statistic (Edwards, 1957) 
which is often associated with measuring variability 
in this context. A brief description of the computa- 
tion of “V” measures will illustrate the nature of 
the “V” approximation to Q. Variability indices were 
computed by first determining the number of raters 
in approximately the extreme 25% tails of each item 
distribution of rater judgments. In the case of 14 
raters, 4 ratings or judgments at each end of the 
distribution account for 56% of the total sample. 
In effect, this leaves 6 judgments in the middle 44% 
of the distribution. With 12 raters, 3 judgments on 
each extreme tail of the distribution account for 
50% of the ratings, etc. Thus, when 14 raters were 
involved, a “V” measure would be computed by 
noting the frequency distribution of a given item, 
and subtracting the scale category (1 to 7) which 
contained the fourth judgment from the bottom 
from the scale category which contained the fourth 
judgment from the top. For example, if 11 judges 
rated a job task “5” and 3 judges rated it “6,” the 
variability index would be “0” but if 10 judges 
rated a job task “5” and 4 judges rated it “6,” the 
“VY” index would be “1,” etc. The product-moment 
correlation between Q values and “V” indices was .73 
for job tasks on the finger-manipulation dimension. 

4, For each dimension seven statements were 
selected which had low indices of variability and 
median values closest to the numerals 1 through 7. 
Thus, for each of the five dimensions a scale was 
developed which had seven levels represented by 
seven task statements. The scale for the finger- 
manipulation dimension which was developed by 
the above procedure is illustrated in the display 
below. 


performs surgical operations on 
human beings 
6 | plays weekly commercial piano 
concerts 
5 | mounts skins of animals in life- 
like form 
4 | installs plumbing in homes 
3 | places mail in mailboxes 
2 |picks oranges from trees 
LOWEST DEGREE 1 |carries pieces of furniture 


HIGHEST DEGREE 7 


Reliability Comparisons 


This phase of the study was planned for the 
purpose of comparing the reliability with which 
raters could rate job activities on the five dimen- 
sions when using scales with job-task benchmarks, 
and when using scales with simple numerical bench- 
marks. It was hypothesized that scales with job- 
task benchmarks could be used more reliably. 

To test this hypothesis the same job tasks were 
rated by both scales. The job tasks rated in this 
phase did not include the seven tasks for each scale 
which had previously been selected to represent 
scale-level anchors. Thus the original lists of 50 or 
45 tasks in the present investigation consisted of 


94 Davip L. PETERS AND ERNEST J. McCormick 


43 or 38 job tasks, depending on the dimension in 
question. For further comparative purposes it was 
planned to describe the results of scale utilization 
reliability in terms of “most” variable versus “least” 
variable items. For this purpose after judges had 
rated items on a particular dimension using either 
a job task anchored or a numerically anchored 
rating scale, reliability analyses were made in 
terms of three groups of items: Group A, Group B, 
and Total N. Group A consisted of the 25-task 
statements from each of the original lists which had 
the least variability as resulting from earlier scaling 
operations. Group B consisted of the residual items, 
this group then including items which, in the 
earlier scaling process, had had the greater rater 
variability. This would make it possible to obtain 
insight as to whether the use of benchmark items 
would improve the reliability with which judges 
rated “most” variable versus “least” variable items. 
Obviously for purposes of overall comparisons, both 
groups could be combined in order to analyze reli- 
ability indices for total groups of items. 

Initial ratings of job-task statements had been made 
in terms of 7-point numerically anchored scales. 
Therefore ratings based on numerically anchored 
scales were already available. It should be noted that 
reliability computations for these initial ratings 
were made independent of the seven job-task anchor- 
ing statements which were taken from each original 
pool of items. For example, total sample reliability 
on numerically anchored scales was based on 43 
items rather than 50. Thus, in the present phase 
of the study, comparable groups of judges (i.e., 
upper-level engineering undergraduates and first- and 
second-year industrial psychology graduate students) 
rated job tasks in terms of the job-task bench- 
mark scales which had been developed. In this way, 
each of the five lists of job-task statements was 
rated by two independent groups of raters. One 
group of raters used scales employing numerical 
benchmarks and the other group used scales employ- 
ing verbal benchmarks. In all cases the group of 
judges that used numerically anchored scales to 
rate tasks for a given dimension was the same 
size as the group of judges which used verbally 
anchored scales to rate tasks for the same dimension. 

The data obtained were subjected to an analysis 
of variance as described by Guilford (1954, p. 395) 
for the purpose of determining the reliability of 
the ratings. Such an approach may provide two 
types of reliability estimates. One value, computed 
in the form of an 7, gives essentially an average of 
all the correlations between all pairs of raters on 
a given scale. It might be looked upon as the 
“reliability of a single rater.’ The other value, 
computed in the form of an fan, gives an estimate 
of the reliability of mean ratings from 7» raters. 
Operationally, this might be considered the cor- 
relation between m raters and additional hypo- 
thetical raters of equal ability. It might be viewed 
as a type of “stepped up” reliability coefficient. In 
the context of the present study it was decided that 
the r estimate of reliability would be the more 


meaningful as a base for comparative purposes. 
However, nn estimates would be computed for 
general informational reasons. 


RESULTS AND DISCUSSION 


Since the same items that had been rated 
in terms of numerically anchored scales were 
also rated in terms of job-task anchored 
scales, a statistical comparison of the relia- 
bilities produced by the two methods was a 
relatively straightforward matter. 

Statistically significant differences between 
r reliability coefficients were estimated by a 
modification * of a conventional 2’ transforma- 
tion statistic used in testing the significance 
of the difference between two correlation co- 
efficients (Edwards, 1960, p. 83). The results 
of the reliability and statistical comparisons 
are presented in Table 1. 

Considering the total groups of items 
(Groups A and B combined) of the five di- 
mensions, four of the 7 coefficients based on 
the job-task benchmark scales were greater 
than those based on the numerical scales; of 
these four, two differed significantly at the 
.O1 level. The only dimension offering oppo- 
sition to the general trend was the near visual 
discrimination one. In that observed differ- 
ences on this dimension were in the opposite 
direction to those hypothesized, it was consid- 
ered inappropriate to utilize a one-tail test of 
significance for statistical comparisons. 
Therefore, in order to gain some indication of 
the significance of the difference observed, a 
two-tail test was applied to reliabilities ob- 
served on the near visual discrimination di- 
mension. Observed results indicated that for 
this particular dimension numerically an- 
chored scales could be used with significantly 
greater reliability than task-anchored ones. 

Even though the reliabilities on only one 


5 The modification was suggested in a personal com- 
munication with B. J. Winer, Purdue University. The 
statistic used in the present study was the following: 


oy —— Zo! 


1. 1 
Vz — 3 a 112° 3 
Number of Judges 
2 


When there was an uneven number of judges (e.g., 9), 


n= Senter of Judes — X Number of Scale Items. 


Z,£= 








Where x = xX Number of Scale Items. 
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TABLE 1 


RELIABILITY OF RATINGS OF JoB TASKS ON VARIOUS WoRKER Activity DIMENSIONS WHEN 
Usinc NUMERICALLY-ANCHORED SCALES AND JoB-TAskK ANCHORED SCALES 











Numerical anchors 





Job-task anchors 











mee pete Z Observed 
Dimension and group Number eas Reliabil- iene’ Reliabil- | Number] between # 
of job-tasks of raters | ¢ y | ity form apy | ity form | of items| task and? 
orasingle| jtaters |forasingle| j+aters ; 
rater rater numerical 
ii Tnn i Tnn 
Finger manipulation 
Group A 14 Oona .96 eo ines .98 25 2.97 
Group B 14 34 .88 AT .93 18 1.25 
Total 14 4 F* 94, (OSmae 97 43 2.81 
Near visual discrimination 
Group A 12 87+ .99 78+ .98 25 —2.40 
Group B 12 ni 97 .62 95 13 —1.22 
Total 12 Oona oe .98 24+ 97 38 —2.97 
Far visual discrinination 
Group A 9 61 93 .59 93 DS — .21 
Group B 9 12 54 ea Hs 18 94 
Total 9 A5 88 48 .89 43 37 
Auditory discrimination 
Group A 9 84 98 .84 .98 25 = 
Group B 9 POO 38 ith 91 13 2.67 
Total 9 .70 95 ald 97 38 1.29 
General physical coordination 
Group A 9 A8** .89 A8** 97 De 3.69 
Group B 9 EOL 84 46 .89 18 65 
Total 9 46** .88 .68** 95 43 3.03 
+ <.05 from corresponding entry in same row (based on two-tailed critical value of Z = 1.96). 
+-+ » < .01 from corresponding entry in same row (based on two-tailed critical value of Z = 2,58). 
*  < .05 from corresponding entry in same row (based on one-tailed critical vlaue of Z = 1.64). 
*&  < .01 from corresponding entry in same row (based on one-tailed critical value of Z = 2.33). 


of the dimensions opposed the general trend, 
it was considered appropriate to subject the 
“total” reliability estimates (7) for all five 
dimensions to a final test of overall statistical 
significance, in order to obtain some evidence 
of the relative effectiveness of the two types 
of scales. For this purpose it was decided to 
use a chi-square transformation test as de- 
scribed by Jones and Fiske (1953, p. 376). 
Tabled values facilitating the application of 
this test have been published by Gordon, 
Loveland, and Cureton (1952, pp. 312-314). 
The observed chi-square of 32.21 (10 df) 
was significant at the .001 level. It is appar- 
ent that the one observed difference (i.e., near 
visual discrimination dimension) in opposi- 


tion to the remaining four was not of suffi- 
cient magnitude to offset the general trend of 
the data. Conclusions borne out by 7 analy- 
ses were not confirmed by analyses of fnn 
estimates. This result suggests that future ex- 
periments might find that increasing the sam- 
ple size would eliminate the difference in re- 
liability between the two scaling methods. 
The results of the overall 7 analysis seem 
to offer evidence to support the hypothesis 
that, in general, scales constructed of job- 
task benchmarks could be normally used with 
greater reliability than scales constructed of 
numerical benchmarks for rating job activities 
in the form of job-task statements. While the 
results indicate that from the point of view 
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of reliability verbal benchmarks are generally 
superior, the differences between methods 
with respect to scale type (e.g., ordinal ver- 
sus interval) have not been explored. It may 
be that the gain in reliability is offset by a 
loss of interval properties for scales anchored 
by verbal benchmarks. These considerations 
suggest several avenues of future research. 
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CONVERGENT JOB EXPECTATIONS AND RATINGS 
OF INDUSTRIAL FOREMEN* 


JOHN W. LAWRIE 2 
Wabash College 


This study hypothesized a significant positive correlation between evaluations 
of foremen made by superiors and subordinates and the degree to which 
foremen share and accurately predict superior-subordinate expectations regard- 
ing the foreman’s job behavior. The Ss, (8 superiors, 32 foremen, and 377 
subordinates) responded to a questionnaire composed of “consideration” and 
“structure” items yielding “expectation-convergence scores” which were cor- 
related with evaluation measures. In a department providing close face-to-face 
interaction, a significant proportion of the variance in superior evaluations 
was accounted for by the hypothesis of expectation convergence. Various 
implications of these data for common personnel practices are discussed. 


A number of researchers in the field of 
industrial leadership have postulated the im- 
portance of “shared expectations” in con- 
tributing to leader effectiveness. For example, 
Likert (1958) says: ‘‘A leader to be effective 
must always adapt his behavior to fit the 
expectations, values, and interpersonal skills 
of those with whom he is interacting [p. 327]. 
Levinson, Charlton, Munden, Mandl, & Solley 
(1962) make the notion of shared expecta- 
tions central to their formulation of the 
reciprocal psychological contract. They note, 
“The frequency with which these expectations 
seemed to. have an almost obligatory quality 
was impressive to us. As people expressed 
their expectations . . . it was if the company 
. .. were duty bound to fulfill them [p. 20].” 
To the extent that the first line foreman is 


1Portions of this paper were presented at the 
1964 Annual Convention of the American Psycho- 
logical Association, 6 September in Los Angeles 
under the title “Superior and Subordinate Evalua- 
tion of Foremen as a Function of Convergent Job 
Expectations.” 

2 This research was carried out while the author 
was a Staff Specialist with the General Motors 
Institute, Flint, Michigan. Results presented here are 
a part of a larger research program designed to 
increase organizational effectiveness in a large engi- 
neering and manufacturing facility. 
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construed to be a representative of the com- 
pany, shared expectations can be expected to 
be of importance to him in carrying out his 
function. These expectations, then, being 
shared and reciprocal between leaders and 
followers, become the stuff of a psychological 
contract which “. . . governs their relationship 
to each other [Levinson et al., 1962, p. 21].” 
Davis (1962) has utilized the concept of lead- 
ership role expectations more explicitly as a 
determinant of leader success. He states “The 
foreman’s role particularly requires that he 
be adaptive in working with the extremes of 
subordinate and superior behavior. . . . In 
order to be adaptive, a manager needs to... 
see his role as seen by others [p. 41].” Simi- 
lar presentations of this and related points 
of view are found in Sargent (1951), Homans 
(1950), Knickerbocker (1951), Tannenbaum, 
Wechsler, and Masarik (1961). 
Paraphrasing these previous presentations, 
the following statement can be made: Leaders 
who understand what is expected of them by 
their superiors and subordinates will tend to 
perform more effectively than leaders who 
have less understanding of these expectations. 
The theoretical link between expectations 
of supervisors and subordinates as a situa- 
tional variable with which the leader must 
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cope has been made but research on the prob- 
lem has yielded disappointing and mixed re- 
sults. For example, Patton (1954) studying 
the relative ability of supervisors to under- 
stand what is expected of them by both 
workers and management, found that such 
understanding was not significantly related 
to management rank-order ratings. Meyer 
(1959) hypothesized that “High rated fore- 
men would agree better with their general 
foremen as to the degree of job responsibility 
for functions than would low rated foremen 
[p. 451].” Although there were individual 
differences in their degree of concurrence with 
general foremen, Meyer concludes that the 
hypothesis was not demonstrable. Foa 
(1957), has demonstrated that, in a military 
sample, subordinate satisfaction with leader- 
ship is related to the extent to which subordi- 
nates feel superiors act in terms of their 
expectations on the matter of discipline. 

As will be seen, the relationship hypothe- 
sized by these and other studies between 
leader effectiveness and expectation § con- 
vergence may be demonstrable only when 
other situational variables are taken into 
account. 


METHOD AND SAMPLE 


The present study was aimed at testing the expec- 
tation—effectiveness notion in an applied industrial 
setting. The subjects (Ss) were all personnel in two 
departments from three organizational levels. In 
Department A, Ss were 5 General Foremen, 19 
Foremen, and 248 subordinates. In Department B, 
3 General Foremen, 13 Foremen, and 129 subordi- 
nates constituted the sample. 

The work environment differed markedly between 
the two departments. In Department A, frequent 
rotation of foremen caused interruptions of relations 
between foremen and general foremen; in Depart- 
ment B, however, more stable relations between 
general foremen and foremen were possible due to 
less rotation. In both departments, however, fore- 
men changed subordinates on a rotating basis every 
3 months. 

A questionnaire was administered to all Ss which 
contained the “initiating,” and “consideration” items 
summarized by Stagner (1956) as having high-factor 
loadings in a study by Fleishman, Harris, and Burtt 
(1955). The Ss were instructed to respond in terms 
of these behavioral items as follows: 


General Foremen—1. How should your foremen act 
in terms of these behaviors? (Expectation: E1) 
2. How does each foreman act? (Perception: P1) 
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Foremen—1. How should you act in terms of these 
behaviors? (Self-Expectation: E2) 

2. How does your superior expect you to act in 
terms of these behaviors? (Expectation Prediction: 
EP1) 

3. How do your subordinates expect you to 
behave in terms of these behaviors? (Expectation 
Prediction: EP3) 


Subordinates—1. How should your foreman act in 
terms of these behaviors? (Expectation: E3) 
2. How does your foreman act? (Perception: P3) 


Following Rosen and Weaver (1960) scores on 
an item were composed of two parts: (a) an esti- 
mate from each respondent (general foreman, fore- 
man, or subordinate) as to how often a foreman 
should act as described by the item; and (b) an 
estimate of how important it was to the respondent 
that the foreman act as described by the item. 
These responses were obtained on two 7-point scales 
ranging from “always” to “never” on the “how 
often” dimension; and from “essential” to “intoler- 
able” on the “how important dimension.” In each 
case scale points were behaviorally defined in terms 
of the respondent’s expectations summarized by his 
responses to the “how often” and “how important” 
dimensions. 

Evaluations of foremen were provided by general 
foremen and subordinates as follows: 

1. General Foremen (a@)—Company Merit Rating 
=GFi, (b)—Global Performance Rating = GF», 
(c)—Global Potential Rating =GFs, (d)—Last 
Merit Increase = GF,, (e)—Expectation minus Per- 
ception = GFs, and, 

2. Subordinates (a@)—Expectation minus Percep- 
tion =S:. (Other measures, e.g., tardiness, absentee- 
ism and grievances which could have been con- 
sidered as subordinate evaluations were unfortunately 
not reliably available for the present sample.) 

Since the foreman was the focus of the study, 
expectation and perception forces converge on him 
from himself, (Es) his subordinates, (Es) and his 
superior (E1). In addition, foremen made predictions 
of what superiors and subordinates expected of them 
(EP: and EPs). Finally, general foremen’s and sub- 
ordinates’ perceptions of the foreman’s behavior 
were determined (Pi and Ps). Within this framework 
two of the study’s convergency hypotheses were as 
follows: 


1. Foremen whose self-expectations (E2) are more 
convergent with other’s expectations (Ei and 
Es) will be more highly evaluated than fore- 
men with more divergent expectations. 

2. Foremen whose predictions of what is expected 
of them (EP: and EPs) are more convergent 
with actual expectations (Ei and Es) will be 
more highly evaluated than foremen with less 
convergent predictions of expectations. 


RESULTS 


Results relevant to these hypotheses are 
shown in Table 1. 
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TABLE 1 


PEARSON PRopUCT-MOMENT CORRELATIONS: JOB EXPECTATION CONVERGENCY VERSUS SUPERIOR 
AND SUBORDINATE EVALUATIONS OF FOREMEN 





Expectation convergence 











(E1 — E2)s (EP1 — E1)> 
Evaluation Dept A Dept B Dept A ' Dept B 
General Foreman 
Company Merit Rating 23 2d 04 47 
Performance Rating 03 Sidi 24 O35" 
Potential Rating eZ. One 32 OD 
Last Merit Increase nS 34 BLS 38 
Expectation minus Perception .09 2s 05 42 
(E3 — E2)¢ (EP3 — E3)4 
Dept A Dept B Dept A Dept B 
Subordinates 
Expectation minus Perception .09 14 .05 AUS) 
* p< .05. 
«+ <.01 


a E1 — E2 = General foreman’s expectations minus foreman’s expectations. 

b EP1 — Ei = Foreman’s prediction of general foreman expectations; minus general foreman’s expectations. 
¢ E3 — E2 = Subordinates expectations minus foreman’s expectations. 

d #P3 — E3 = Foreman’s prediction of subordinate’s expectations minus subordinate’s expectations. 


On the basis of the evidence presented in 
Table 1, several statements are feasible: 

1. The convergency hypotheses were not 
demonstrable in the foreman-subordinate re- 
lationship in either department in this sample. 

2. The convergency hypotheses were not 
demonstrable in Department A in the fore- 
man-general foreman relationship. 

3. In neither department were company 
merit ratings or last merit increase related 
to expectation convergencies. Both these 
measures were restricted in range. 

4. In Department B, general foreman 
expectations minus foreman expectations con- 
cerning the “initiating” and “consideration” 
dimensions of the foreman’s role were sig- 
nificantly related to three evaluations of fore- 
men made by general foremen: Performance 
Rating, Potential Rating, and Expectation 
minus Perception. 

5. In Department B, the foreman’s ability 
to predict what was expected of him by the 
general foreman was related to his Perform- 
ance and Potential ratings. 


DISCUSSION 


To check the possibility that foremen in 
Department B had greater skill in “predict- 


ing” (Ep; — E;) and “sharing” (E; — E2) 
general foreman and subordinate expectations, 
mean divergency scores for the two groups 
were computed. No significant differences 
were found. Thus, foremen in the two depart- 
ments were equally adept on the average in 
terms of their ability to “share” and “predict” 
what was expected of them by general fore- 
men and subordinates. 

It would appear, therefore, that situational 
variables in addition to expectation con- 
vergencies had a differential impact on evalu- 
ation of foremen in the two departments. It 
will be recalled that foremen in Department B 
had more frequent and stable interactions 
with their general foremen than those in 
Department A. Foremen interactions with 
subordinates were roughly equivalent in that 
foremen were rotated every 3 months in both 
departments. Thus, in a situation providing 
for relatively consistent face-to-face inter- 
action, it is possible to account for a signifi- 
cant proportion of the variance of certain 
superior evaluations by the hypotheses of job 
expectation convergence. In situations where 
interaction is disjointed or random, the 
expectation convergence model was not 
demonstrable. 
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These results have theoretical implications. 
What appears to be required in order that 
the expectation convergency model be useful 
is the sequential development of at least the 
following additional situational variables. 
First, a foreman must learn what is expected 
of him on the job, and have convergent self- 
expectations regarding his role. Then, he 
must “act out” expectations, in a sufficiently 
stable situation so that his carrying out of 
what is expected of him can be perceived 
by superiors and subordinates and reinforced 
by them. Thus, although researchers in indus- 
trial leadership have postulated the impor- 
tance of convergent expectations, it appears 
that expectation convergency, though perhaps 
necessary is not sufficient in itself to account 
for leader effectiveness. 

On the other hand, to the extent that the 
situational variables in addition to expecta- 
tion convergence are operating, the application 
of this method to merit rating may be useful. 
In this way instead of the typically fuzzy 
meaning that may attach to merit ratings, 
specific deviations from expected behavior 
may be isolated thus providing the means to 
more effective appraising, rewarding and de- 
veloping supervisory personnel, Secondly, in 
companies utilizing ‘“‘post-appraisal interviews” 
designed to let appraisees “know where they 
stand,” reference to specific behavioral devia- 
tions from expectations should be more useful 
in counseling than the typical ‘‘good-points- 
bad-points-good-points” sandwich. 

Finally, these results raise some questions 
about certain personnel practices which have 
been designed to improve supervisory per- 
formance: 

1. Job Rotation—What is the effect on a 
subordinate in terms of learning what is ex- 
pected of him when he is regularly, or ir- 
regularly, rotated among superiors? These 
data suggest that such rotation may vitiate 
job expectation convergence. If this is true, 
one of the typical objectives (and selling 
points) of such programs, namely, “learning 
the ropes,” may not be realistic. Thus, an 
important variable in Job Rotation programs 
would seem to be length of tenure in each of 
the position assignments of the rotation se- 
quence. These data suggest that short-term 
rotational assignments may be impractical, 
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particularly if they are sporadic and are 
perceived by the trainee as unplanned. 

It may be, however, that as one proceeds 
upward in the organization, the set of expec- 
tations that, in combination, define job re- 
quirements becomes more general. Thus rota- 
tion at higher levels may yield greater returns 
than at lower levels in the organization. What 
may be called for then, to be most useful, is 
a system of increasing rotation as a function 
of generality of job expectations. 

2. Human Relations Training—Are gen- 
eralized supervisory development programs 
which purport to “cut across all management 
situations” job oriented enough to be of any 
use to the foreman in learning the specific 
expectations surrounding his job? These data 
suggest that success in the organization may 
be at least partially a function of specific job 
expectations, in addition to “general principles 
of management.” 


SUMMARY 


This study hypothesized a significant posi- 
tive correlation between evaluations of fore- 
men made by superiors and subordinates and 
the degree to which foremen share and 
accurately predict superior and subordinate 
expectations regarding the foremen’s job 
behavior. The Ss (8 superiors, 32 foremen, 
and 377 subordinates) responded to a 13-item 
questionnaire composed of “consideration” 
and “structure” items yielding “expectation- 
convergence scores” which were correlated 
with evaluation measures. In a department 
providing relatively close face-to-face inter- 
action, a significant proportion of the variance 
of some superior evaluations was accounted 
for by the hypothesis of expectation con- 
vergence. Various implications of these data 
for Job Rotation and Human Relations Train- 
ing programs suggest the possibility that they 
may operate to limit expectation convergence 
and therefore individual effectiveness of 
trainees, 
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TRANSFER OF TEAM SKILLS AS A FUNCTION 
OF TYPE OF TRAINING* 


WILLIAM A. JOHNSTON 
Ohio State University 


5 groups varying in training context (team versus individual) and skill acqui- 
sition (individual, coordination, and communication skills) were compared at 
transfer on team (coordination of interceptions) and individual (number of 
interceptions) performance of a simulated radar-controlled aerial intercept 
task. Individual performance was unaffected by the training variables, but 
team performance was a positive function of the emphasis on coordination 
skills during training. When acquisition of coordination skills was held constant, 
context had no effect on transfer performance. Intrateam communications 
retarded performance but prohibiting these communications during training 
did not lessen their disruptive effect at transfer. This inhibitory influence of 
team communications reflected the verbal transmittal of information irrelevant 


to the task or more readily obtainable from the radar scopes. 


It is generally assumed that training should 
permit the acquisition of all the skills to be 
called upon at transfer and that the training 
time devoted to a particular skill should cor- 
respond to its relative involvement at trans- 
fer. Of course, one way to assure the optimal 
balancing of transfer skills during training is 
to make the training task as similar to the 
transfer task as possible; hence, the well- 
known ‘“‘identical elements” theory of transfer 
(Deese, 1958, p. 218). Accordingly, when 
transfer is characterized by team activity, it 
would appear desirable to train potential team 
members in a team context rather than indi- 
vidually in order that the necessary team 
skills may be developed. However, four fac- 
tors limit the generality of this thesis, par- 
ticularly when the time available for training 
is limited. First, individual skills may be con- 
siderably more essential to efficient team per- 
formance at transfer than are team skills. This 
would be especially likely if team members 
have largely independent functions to perform 
at transfer (Glanzer, 1962). Second, the indi- 


1 This research was carried out in the Laboratory 
of Aviation Psychology and was supported by the 
United States Navy under Contract No. N61339- 
1327, sponsored by the United States Naval Training 
Device Center,-Port Washington, New York. Repro- 
duction of this publication in whole or part is per- 
mitted for any purpose of the United States Gov- 
ernment. The writer is grateful to James C. Naylor 
for his help in planning the study, and to George 
EK. Briggs for his consultation throughout the full 
course of the study. 


vidual skills may be more difficult to learn 
than the team skills, regardless of their rela- 
tive importance; in fact, the required team 
skills may already be in the trainee’s reper- 
toire. Third, well-developed team skills may 
not be the exclusive product of training in a 
team context (team training), and some may 
even be more readily acquired in an indi- 
vidual context (individual training). Finally, 
team training may lead to the development 
of habits which actually retard performance 
in a team context at transfer. For example, 
previous data indicate that training in a team 
context fosters the acquisition of interperson 
communication habits which inhibit team per- 
formance at transfer (Briggs & Naylor, 1965; 
Naylor & Briggs, 1965). 

In the light of these considerations it is 
not surprising that individual training has 
been found to surpass team training in terms 
of team performance at transfer, especially 
since the transfer tasks used demanded little 
or no teamwork (Briggs & Naylor, 1965; 
Horrocks, Krug, & Heermann, 1960). Pursu- 
ing the speculations of Briggs and Naylor 
(1965), the present study was designed to re- 
examine the team versus individual training 
issue using a transfer task that requires much 
more teamwork (i.e., coordination between 
teammates) than has characterized the tasks 
of previous studies. The underlying thesis was 
that the transfer effects of training context 
(team versus individual training) are medi- 
ated by the specific skills acquired, and that 
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if context and skill acquisition are indepen- 
dently manipulated, only the latter will affect 
transfer performance. 

A team coordination transfer task was used 
in which two radar controllers (RCs), oper- 
ating in a simulated radar controlled air 
intercept environment, were permitted to in- 
tercommunicate as they attempted to effect 
simultaneous (coordinated) interceptions of 
target aircraft. The ability of teammates to 
coordinate at transfer was examined as a 
function of five training conditions which 
varied in terms of team versus individual 
context and team versus individual skill 
requirements. A team context existed when 
the two RCs (receiving training at the same 
time and in the same room) were mutually 
dependent in their performance of the task, 
while an individual context existed when the 
two RCs were functionally independent. Co- 
ordination and communication were consid- 
ered to be team skills since they are normally 
(though not exclusively) involved in team 
activity, while all other skills required to 
direct a single interceptor to a single target 
were considered to be individual skills. The 
partial nondependency of context and skill 
requirements is exemplified by the individual- 
coordination condition in which each RC, 
working independently, must coordinate two 
interceptors under his control by directing 
them to corresponding targets simultaneously. 

The five training conditions may be des- 
ignated mnoncoordination-noncommunication 
(Group 1), moncoordination-communication 
(Group 2), individual coordination-noncom- 
munication (Group 3), team coordination- 
noncommunication (Group 4), and team 
coordination-communication (Group 5). Com- 
munication versus noncommunication refers 
to whether or not inter-RC (team) communi- 
cation was permitted; coordination versus 
noncoordination refers to whether or not 
simultaneous air interceptions were required; 
and team versus individual coordination re- 
fers to whether the coordination was between 
two interceptors, each controlled by a differ- 
ent or the same RC. The training conditions 
were expected to differ primarily in the extent 
to which critical team skills can be acquired; 
the necessary individual skills required to 
direct a single interceptor to a single target 
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could be developed in all five groups. The 
question of team versus individual context 
was approached by comparing Groups 3 and 
4, Since skill acquisition was held relatively 
constant during training (both groups learned 
coordination skills, neither learned communi- 
cation skills), these groups were not expected 
to differ in terms of team performance at 
transfer. The importance of training on team 
communications, holding context and coordi- 
nation training constant, was explored prima- 
rily by comparing Groups 4 and 5 (team 
context, coordination training) on the team 
coordination-communication transfer task. 
Group 4 was predicted to surpass Group 5 
due to the transfer of inhibitory communica- 
tion habits in the latter condition. Groups 
1 and 2 (individual context, noncoordination 
training) might also have afforded an exami- 
nation of training on communication habits, 
but it was anticipated that teammates would 
not intercommunicate in Group 2. Any such 
dependency of team communication on con- 
text could be uncovered by comparing Groups 
2 and 5 in terms of team communication fre- 
quency during training. Finally, an analysis 
of training on coordination skills was made 
available by the data of Groups 1 and 3 
(individual context, noncommunication). 
Team coordination at transfer was expected 
to be better in Group 3 than in Group 1 
due to the positive transfer of coordination 
skills in the former condition. 


METHOD 


Subjects and design. Each of the five experimental 
groups contained seven two-man teams. Each team 
served for eight 35-minute sessions over a 2-week 
period; the first four sessions being training, the 
last four being transfer. All five conditions were run, 
in different orders, on each of seven overlapping 
2-week periods. At the conclusion of his service, each 
undergraduate male subject was paid $10.00. 

The experimental design is summarized in Table 1. 
It may be recalled that all conditions permitted the 
acquisition of the individual skills required to direct 
interceptors to targets. In addition, a simulated radio 
link between teammates permitted the development 
of (presumably disruptive) verbal communication 
habits in Group 5 and possibly in Group 2. Finally, 
coordination skills could be Jearned in Groups 3, 4, 
and 5. Of course, the team coordination-communica- 
tion transfer task called upon all three classes of 
skill. It is important to note that though fidelity of 
simulation (training-transfer similarity) was highest 
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TABLE 1 
EXPERIMENTAL DESIGN 
Group Training conditions Transfer conditions 
1 Noncoordination-noncommunication® Team coordination-communication 
2 Noncoordination-communication Team coordination-communication 
5 Individual coordination-noncommunication Team coordination-communication 
4 Team coordination-noncommunication Team coordination-communication 
5 Team coordination-communication Team coordination-communication 





a See the text for a description of the conditions. 


in Group 5, greatest positive transfer was expected 
in Groups 3 and 4. 

Apparatus and procedure. The apparatus was a 
modified version of the OSU Electronic Air Traffic 
Control Simulator which is detailed in a previous 
report (Hixson, Harter, Warren, & Cowan, 1954). 
Very briefly, the adapted version of the equipment 
consists of a special-purpose analog computer capable 
of providing up to 28 simulated radar returns on 
several 14-inch (large scopes) and 5-inch (small 
scopes) CRT displays. The radar returns move and 
turn at realistic rates and are represented by a 
clock-coding scheme described previously (Briggs & 
Naylor, 1964). In the present study, the small scopes 
were used during training, and the large scopes 
during transfer; both large and small scopes simu- 
lated 200 miles of airspace. 

On the first two sessions, taped instructions and 
supervised practice under relatively simple task 
conditions were administered. Thereafter (last two 
training sessions and all four transfer sessions), the 





Fic. 1. A schematic representation of one of the 
initial conditions used during transfer. (The Is and 
Ts represent interceptor and target positions, respec- 
tively. In actual practice clock codes appeared at 
these indicated positions and the dashed lines were 
not present; they indicate here the future positions 
of the targets.) 


basic task was used in which the two RCs of a team 
worked with separate scopes, each scope displaying 
(in identical fashion) the same set of eight radar 
returns; four returns representing target aircraft and 
four representing corresponding interceptors. Figure 1 
is a schematic illustration of a scope face as it 
appeared at the beginning of a session. A target (T) 
entered at the periphery of a scope quadrant on a 
straight-line course and continued until it was 
intercepted (“hit”) or out of the scope quadrant 
(“missed”), whichever happened first. Fifty seconds 
after a hit or a miss the target was reset at the 
periphery of the same quadrant with a different one 
of 64 entry characteristics made up of four points 
of entry, four headings per point, and four speeds 
per heading. The same preprogramed order of the 64 
entry characteristics per target was used for all 
teams; this order being random with the restrictions 
(a) that no characteristic be used more than twice, 
and (b) that T: and Ts, as well as Ts and Ty, 
possess symmetrical entry characteristics as shown 
in Figure 1. Finally, the target entries were such 
that it was always possible for an interceptor, start- 
ing from any quadrant location, to hit its target. 
Each interceptor was assigned only to the target in 
its quadrant, and RCs directed interceptors by issuing 
heading and speed commands over a verbal channel 
to “pilots” who immediately entered these commands 
on target generator consoles located in another area. 
Interceptors were always left intact, that is, never 
reset, after hits. 

Under team coordination conditions, RC: at- 
tempted to intercept Ti with I, and T; with Is while 
RCz pursued Ts and Ts by directing Is and Is. Under 
individual coordination conditions the Is and Ts 
were distributed differently between the two RCs: 
RC, pursued T; and Tz with I, and I, and RCs 
directed I; and Il, to Ts; and Ts. The coordination 
aspect of these conditions was that symmetrical 
targets (Ti-and Ts, Ts and Ts) were to be inter- 
cepted simultaneously. Thus, team coordination in- 
volved coordination between Ss and individual co- 
ordination within Ss. The scope provided visual access 
to information needed to coordinate, and in the 
team coordination-communication conditions (all 
groups at transfer, Group 5 at training), the team 
communication channel served as an alternate means 
of gathering relevant information. Under noncoordi- 
nation conditions, of course, simultaneous hits were 
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not required and the above information sources were 
not needed. Interceptor-target pairs were rotated 
from one quadrant to the next at regular intervals 
to assure that RCs in all conditions would have equal 
practice attempting interceptions in all quadrants. 

A hit was defined as an interceptor-target separa- 
tion of 2 miles of airspace during training and 1 mile 
of airspace during transfer (scope distance about .8 
millimeters in both cases). In the coordination con- 
ditions, all four coordinating aircraft (two inter- 
ceptors and two targets) were “frozen” on the scope 
when a hit occurred. Then, during the 50-second 
target resetting interval, degree of coordination was 
measured as the distance (in miles) separating the 
nonintercepted target and its interceptor. Thus, co- 
ordination scores could range from 1 mile to a 
theoretical maximum of 100 miles. Of course, the 
obtained range was considerably less (1-38 miles) 
with only 1% of the coordination scores exceeding 
20 miles. In the noncoordination conditions, only the 
specific interceptor and target involved were frozen 
when a hit occurred. The independent freezing and 
resetting of targets in the noncoordination conditions 
produced temporal asymmetry of otherwise sym- 
metrical targets. Freezing, measuring, and target re- 
setting were effected by an E who monitored the 
task from his own scope. 


RESULTS AND DISCUSSION 


Two measures were available for each team 
on each transfer session: number of hits 
(degree of coordination disregarded) and co- 
ordination (degree of coordination averaged 
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across hits). Number of hits may be consid- 
ered an index of individual performance, and 
coordination a measure of team performance. 
Both measures were taken also on each train- 
ing session except that procedural and logical 
restrictions prevented coordination scores to 
be obtained from the noncoordination condi- 
tions, namely, Groups 1 and 2. Groups x 
Sessions analyses were performed on both 
measures for both training and transfer, and 
the results of these analyses are summarized 
in Table 2. It is noteworthy that in no case 
was a significant Groups X Sessions inter- 
action revealed (p > .05 in each analysis). 
Thus, differences between groups in terms 
of individual and team performance were 
roughly constant throughout both training 
and transfer. It may also be noted that except 
for team performance in Groups 3-5 during 
training, a significant sessions effect oc- 
curred reflecting performance improvement 
with practice. 

More critical to the original hypotheses 
was the main effect of groups. Confining our 
attention first to the training data, a signifi- 
cant effect of groups occurred in terms of 
both individual (p< .001) and team (p 
< .01) performance. Duncan’s test (Edwards, 


TABLE 2 


ANALYSES OF VARIANCE OF INDIVIDUAL AND TEAM PERFORMANCE DURING TRAINING AND TRANSFER 














Number of hits Coordination 
Phase Source df MS F df MS F 
Training Between Subjects 34 20% 
Groups (G) 4 678.05 10.54*** 2 3.66 Heol 
Error 30 64.32 18 48 
Within Subjects 35 21 
Sessions (S) 1 452.63 Sia 1 .02 <1.00 
SxXG 4 15.95 1.32 2 29 <1.00 
Error 30 12.05 18 39 
Transfer Between Subjects 34 34 
Groups (G) 4 3.56 <1.00 4 30.56 2.90* 
Error 30 42.80 30 10.54 
Within Subjects 105 105 
Sessions (S) a 135.45 22.00% 3 9.46 2.58* 
SiG 12 4.68 <1.00 12 3.58 1.08 
Error 90 6.16 90 3.32 





8 Coordination scores were available only for Groups 3-5 during training (21 teams yielding df = 20). 
* -05. 
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1963) revealed that, in the case of individual 
performance, Group 1 was superior to all 
other groups except Group 2, and Group 5 
was inferior to all other groups (p< .05). 
Thus, individual performance was inversely 
related to the number of team skills demanded 
by the training task, a finding consistent with 
previous data collected in a similar frame- 
work (Kidd, 1961). Duncan’s test further 
revealed that Groups 3 and 4 exceeded Group 
5 in terms of team performance during 
training, thereby supporting the expectation 
that team communications would retard team 
performance. Consequently, team communica- 
tions had a deleterious effect on both indi- 
vidual and team performance. Groups 1 and 2 
did not differ in terms of individual perform- 
ance, probably because these disruptive team 
communications were extremely rare in Group 
2; a total of only 21 team communications 
occurred during the training of Group 2 com- 
pared to 700 during the training of Group 5. 
As anticipated, then, team communications 
were limited to a team context. 
Collectively, the training data bear on im- 
portant system design matters. Some implica- 
tions are that all team components of the 
system should be minimized when produc- 
tivity (e.g., number of hits in the present 
experiment) is of major concern, and that 
team communications should be prohibited or 
at least minimized when coordination is of 
primary concern. Of course, the latter sug- 
gestion would not generalize to systems in 
which required information can be obtained 
only by team communication. When more 
efficient information channels (e.g., the radar 
scopes of the present study) are available, 
however, the present data indicate that team 
communications can be successfully mini- 
mized by confining system functions to indi- 
vidual contexts; since Group 3 performed at 
least as well during training as Group 4, this 
restriction should not hinder system output. 
The crucial question now becomes: did the 
training conditions differentially affect per- 
formance on the team coordination-communi- 
cation transfer task? Table 2 shows that, as 
expected, the groups differed only in terms 
of team performance at transfer (p < .05). 
The overall mean coordination scores (in 
miles) were 5.26 for Group 1, 5.27 for Group 
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2, 2.83 for Group 3, 3.80 for Group 4, and 
3.94 for Group 5. Duncan’s test (p = .05) 
was used to make the comparisons appropri- 
ate to original hypotheses. A significant dif- 
ference was not obtained between Groups 1 
and 2, 3 and 4, or 4 and 5, but one was found 
between Groups 1 and 3. As expected, there- 
fore, training on coordination skills was bene- 
ficial to team performance at transfer (Group 
1 versus 3), but with skill acquisition held 
constant, training context failed to signifi- 
cantly affect transfer performance (Group 3 
versus 4). The lack of a significant difference 
between Groups 1 and 2 is understandable in 
view of the fact that team communications 
rarely occurred during the training of Group 
2 teams, thereby rendering the two conditions 
ostensibly equivalent. However, despite the 
fact that disruptive communications occurred 
quite frequently during the training of Group 
5 teams, they did not produce negative trans- 
fer (Group 4 versus Group 5). In short, then, 
the only training variable affecting transfer 
performance in the present study was skill 
acquisition, that is, whether or not RCs could. 
acquire the coordination skills called upon at 
transfer. 

Thus, the transfer data support the basic 
thesis that team versus individual training is 
not an important issue so long as both con- 
texts permit the most critical transfer skills 
to be acquired. Coordination skills were ap- 
parently most important in the present task 
(Group 3 surpassed Group 1) and it made no 
significant difference whether these skills were 
developed in an individual (Group 3) or team 
(Group 4) context. However, it is entirely 
conceivable that certain skills are more read- 
ily learned in one context than in the other. 
For example, the restriction of team com- 
munications to a team context in the present 
study implies that it would be inconvenient, 
if not impossible, to shape these communica- 
tions using. an individual context. On the 
other hand, an individual context may be most 
appropriate in regard to other skills. For ex- 
ample, Duncan’s test (p < .05) showed that 
while Group 3 exceeded Group 1 in terms of 
team performance on every transfer session, 
Group 4 surpassed Group 1 on only the third 
session of transfer (despite the nonsignificant 
Groups X Sessions interaction in Table 2). 


TRANSFER OF SKILLS AND TYPE OF TRAINING 


Thus, the superiority of Group 4 over Group 
1 was more transitory than the superiority of 
Group 3 over Group 1. Though indeed quite 
tenuous, these data are consistent with the 
possibility that coordination skills are better 
acquired in an individual context than in a 
team context. A potentially inhibiting factor 
in Group 4 was that each RC had to adapt to 
a flexible and often unpredictable teammate 
during training, a requirement not present in 
Group 3. The time spent learning adaptive 
skills in Group 4 may have been better spent 
learning “pure” coordination skills as was 
done in Group 3. At any rate, the important 
consideration appears to be skill acquisition; 
context becomes important only when it de- 
termines what skills are acquired. The pre- 
viously found superiority of an individual 
context may be interpreted accordingly 
(Briggs & Naylor, 1965; Horrocks et al., 
1960). 

A final consideration concerns the locus of 
the disruptive effect of team communications. 
Inhibition of team performance by team com- 
munications was revealed not only by the 
fact that Groups 3 and 4 surpassed Group 5 
during training, but also by the fact that 
these differences were not significant at trans- 
fer, that is, when team communications were 
permitted, team performance in Groups 3 and 
4 instantly dropped to the same low level 
characterizing Group 5. An effort was made 
to localize the inhibitory influence of team 
communications by analyzing the taped rec- 
ords of these communications which were 
available for five teams per group on Session 
1 of transfer. Each communication recorded 
on these tapes was classified according to the 
system presented in Table 3.-The absolute 
frequency of each type of communication was 
obtained for each of the 25 teams and these 
communications scores were correlated with 
team performance on Session 1 of transfer. 
Only three product-moment correlations were 
significant at at least the p < .05 level by a 
two-tailed test (df = 23), namely, those in- 
volving request-information (r= .59), pro- 
vide-information (r= .63), and _task-irrele- 
vant (r = .54) communications. It is to be 
noted that since team performance is recip- 
rocally related to the magnitude of the 
coordination score, each of the correlations de- 
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TABLE 3 


TEAM COMMUNICATION CATEGORIES 








Category Description 





Error Attempts to call pilots over inter- 


RC channel. 


Communications not pertaining to 
the task, e.g., “Stop talking so 
loud.” 


Communications of information al- 
ready provided on the scopes, 
e.g., “Our targets are on the 
scopes now.” 


Task irrelevant 


Task declarative 


General strategy | Communications referring to more 
than one interceptor or target, 
e.g., “I have both of my inter- 


ceptors at 600 knots.” 


Request Communications in which one RC 
information asks his teammate for informa- 
tion, e.g., ‘‘What is the speed of 
of your I,?” 
Provide Communications in which requested 
information information is provided. 
Volunteer Communications in which non- 
information requested information is pro- 


vided. 


Communications in which non- 
evaluative opinions are requested, 
e.g., “Do you think you can 
intercept T; before he leaves the 
quadrant?” 


Request opinion 


Provide opinion | Communicationsin which requested 


opinions are provided. 


Communications in which non- 
requested opinions are provided. 


Volunteer opinion 


Command Direct commands, e.g., “Speed 
your I; up to 600 knots.” 

Suggestion Indirect commands, e.g., ‘You 
might speed your I; up to 600 
knots.” 

Evaluation Evaluative statements, e.g., ‘“We’re 


(you’re) not doing well today.” 


notes a negative relationship with team per- 
formance. The disruptive effect of task-irrele- 
vant communications is understandable; but 
why did the verbal transmittal of information 
requests and provisions retard performance? 
The answer may be related to the fact men- 
tioned earlier that each RC had access to two 
information channels at transfer: a verbal 
channel supplied by the team communication 
link and a visual channel provided by the 
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scopes themselves. Thus, RC; could obtain 
the speed and heading of one of RC»’s inter- 
ceptors, for example, Is, by requesting and 
receiving this information over the verbal 
channel, or simply by visual inspection of the 
radar return representing Iz. Casual analysis 
suggested that RCs were extremely adept at 
estimating aircraft speed and heading merely 
by examining the radar returns. On a logical 
basis, the visual channel appears to be con- 
siderably more efficient than the verbal chan- 
nel because the former is less time consuming 
and doesn’t divert the RC’s attention from 
the fundamental information source, namely, 
the simulated radar scope. Quite possibly, 
then, the more frequent the time-consuming 
and distracting attempts to gain or send perti- 
nent information over the verbal channel, the 
poorer the coordination achieved. 

Since Group 5 RCs relied quite heavily on 
the verbal channel during training, and Group 
4 RCs on the more efficient visual channel, a 
Groups X Communication category (request 
information and task irrelevant) analysis of 
variance was conducted to ascertain whether 
Group 5 fostered the transfer of disruptive 
communications more than the other groups, 
particularly Group 4.2 However, neither the 
groups main effect, F (4, 40) = 1.72, p > .05, 
nor the interaction, F (4, 40) < 1.00, was 
significant. It is especially surprising that 
disruptive communications at transfer oc- 
curred as often in Group 4 as in Group 5. 
Group 4 RCs, after all, were trained to obtain 
information pertinent to team coordination 
using the visual channel; it can only be con- 
jectured that these RCs were attracted to the 
verbal channel at transfer because of its nov- 
elty. If negative transfer of disruptive com- 
munications had been promoted in Group 5, 
then Group 4 might have surpassed Group 5 

2 Provide-information communications were omit- 
ted from this analysis since the frequency of these 
communications was highly correlated with the 


frequency of request-information communications, 
r= .86. 
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in terms of team performance at transfer as 
originally predicted. 

In brief, then, the present data indicate 
that (a) individual performance is reciprocally 
related to the demands on teamwork, (0) 
team performance is retarded by team com- 
munications, (c) team communication is lim- 
ited to a team context, (d) prohibiting team 
communication during training does not les- 
sen its disruptive effects when it is permitted 
at transfer, (€) training on coordination skills 
is most beneficial to team coordination at 
transfer, and (f) coordination skills can be 
about equally well acquired in individual and 
team contexts. 
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DECISION QUALITY AS A MEASURE OF VISUAL 
DISPLAY EFFECTIVENESS * 


CARL A. SILVER, JAMES M. JONES, anp DANIEL LANDIS? 


The Franklin Institute Research Laboratories, Philadelphia 


A new gaming technique was employed in an attempt to evaluate more 
accurately the effectiveness of visual displays. 18 male university students acted 
as traffic managers for a hypothetical trucking concern. Trucking information 
was presented in map-plus-overlay displays and Ss manipulated trucks, drivers, 
and loads within the framework of the economic rules governing the trucking 
operation. A computer program was written which determined the profit in 
dollars of each Ss performance. 3 independent variables (a) use of color, 
(b) fact density, (c) compression (ratio of symbols to facts) were used in 
this repeated measures design. The analysis of variance indicated that profit 
was a positive function of increasing fact density (p< .001), and that there 
was a significant interaction between fact density and color (p< .001), and 
fact density and compression (p< .05). The usefulness of this technique in 


differentiating among structurally different visual displays was discussed. 


The optimum formats and symbolic coding 
dimensions for use in the construction of vis- 
ual displays has been an important applied 
research question for the past several years. 
Investigators have systematically varied dis- 
play parameters such as format (Hitt, Schutz, 
Christner, Ray, & Coffey, 1961; Klemmer & 
Frick, 1953), the use of color (Green & An- 
derson, 1956; Jones, 1962), information den- 
sity (Ringel & Hammer, 1964), and other 
parameters in an attempt to discover which 
types of displays are most effective in the 
transmission of information to observers. Each 
of these investigations used some criterion of 
effectiveness to evaluate the display param- 
eters. These measures have included, among 
others, search time, reading speed, accuracy, 
response time, and certitude judgments. Al- 
though these traditional measures of display 
effectiveness have been useful as visibility in- 
dices for various parameters, the relationship 
of these measures to the quality of decisions 
made by observers has thus far not been spe- 
cified. 

In a recent research effort Feallock and 
Briggs (1963) simulated an operational mili- 


1This study was carried out at The Franklin 
Institute Research Laboratories as part of a research 
program sponsored by the United States Air Force 
Rome Air Development Center under contract num- 
ber AF30(602)-3302. 

2 The authors wish to express their thanks to Ezra 
S. Krendel for his helpful criticism and suggestions. 


tary-reconnaissance situation in an attempt to 
obtain a qualitative measure of performance. 
They were able to obtain very useful measures 
of performance; however, the simulation 
equipment included on-line digital computers, 
printer/plotters, analogue computers, and 
complicated intercommunication systems. 
Their system would not, because of its com- 
plexity, have practical application in the field 
evaluation of display problems. 

The research reported in this paper is part 
of a larger research program conducted by 
the authors (Silver, Landis, & Jones, 1965). 
The purpose of this paper is twofold: (a) to 
describe a technique for measuring the qual- 
ity of decisions based on visual displays, and 
(6) to demonstrate the applicability of this 
technique in differentiating, in terms of deci- 
sion value, among structurally different visual 
displays. 


METHOD AND PROCEDURE 


Subjects. Eighteen upperclass and graduate men 
were selected, on a volunteer basis, from the popu- 
lation of such students at Drexel Institute, Univer- 
sity of Pennsylvania, and Temple University. The 
age range of subjects (Ss) was 19 through 23 (median 
age 21). The Ss were paid for their time on an 
hourly basis plus 1% of their “profit” in the “game” 
situation to be described. 

Stimuli, The stimuli consisted of 81 40- X 30-in. 
map-plus-overlay displays. Each map showed the 15 
hypothetical cities controlled by the trucking corpora- 
tion, and the intercity mileages. Variation in three 
independent display parameters was incorporated 
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into the construction of the stimuli: utilization of 
color, ratio of symbols to facts (compression), and 
the amount of information per display (fact density). 
The overlays were made from t-inch rectangular and 
pennant-shaped plastic tags, and {-inch round map 
tacks. These overlays were the symbolic representa- 
tion of information required to complete the task. 
Figure 1 shows a sample display, 

Criterion measures. Each display graphically pre- 
sented the current status of a hypothetical trucking 
corporation, The rules governing the operation of the 
trucking concern were distributed to Ss in the form 
of an economic fact sheet. These rules determined 
how the company made money and what costs were 
associated with the various operations. Knowledge of 
these rules enabled subjects to manipulate the trucks, 
drivers, and loads for a given display with some 
awareness of the value of their decisions as they 
were making them, A computer program was written 
to determine the value, “profit,” in dollars and cents, 
of S’s decisions, The profit was the measure of S’s 
performance on each display.§ 

Independent variables. Each map-plus-overlay pre- 
sented information about the trucking operation; 
rectangles represented loads, pennants represented 
trucks, and map tacks represented drivers, The in- 
formation for a given display was quantified in terms 
of the number of facts it represented. There were 
four “facts” associated with each load: the location, 
identification, size, and class. The facts associated 
with each driver were location, identification, and 
hours already driven. Location and _ identification 
were the facts associated with each trvck. The total 
amount of information for a given display was equal 
to the number of loads times the number of facts 
per load (4), plus the number of drivers times the 
number of facts per driver (3), plus the number of 
trucks times the number of facts per truck (2). The 
lowest fact level (Fi = 60 facts) was obtained by 
using 5 loads, 10 drivers, and 5 trucks for a given 
display. The other two fact levels were obtained as 
follows: Fy = 96 facts (8 loads, 16 drivers, 8 trucks) ; 
Fy = 132 facts (11 loads, 22 drivers, 11 trucks). These 
values were chosen to permit the various combina- 
tions of color and compression discussed below. 

The use of color as a coding dimension was the 
second independent variable. Black and white sym- 
bols were used on 4 of the displays, color as a re- 
dundant coding dimension was used on 4 of the 
displays, and color as a relevant coding dimension 
was used on the remaining third of the displays. 
When color was used as a redundant coding dimen- 
sion, the color coded a fact that was also coded in 
another way. When color was used as a relevant 
coding dimension, a particular color was the sole 





8The computer program used has been deposited 
with the American Documentation Institute. Order 
Document No. 8732 from ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress, Washington, D. C, 20540, Remit in ad- 
vance $2.00 for microfilm or $3.75 for photocopies 
and make checks payable to: Chief, Photoduplica- 
tion Service, Library of Congress. 
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coding dimension for a particular fact. In addition 
to black and white, pink, orange, and yellow colors 
were used. 

The third independent display variable was the 
ratio of symbols to facts, compression. In this study, 
a symbol was defined as a tag, map tack, or one 
alpha-numeric group. For example, the letter “B” 
standing alone was a symbol as was the number 123. 

Minimum compression occurred when there was 
one symbol for each fact. Three levels of compres- 
sion were used. These are referred to as 6/6, 5/6, 
4/6, indicating that 6, 5, or 4 symbols, respectively, 
were used to code six facts. It should be noted that 
the method of obtaining the required amount of 
compression varied depending upon the color condi- 
tion. When the color was relevant, the compression 
was obtained through color coding; otherwise, shape 
or position coding were employed. 

Procedure, Problems were constructed by arbi- 
trarily assigning locations, sizes, classes, identifica- 
tions, and hours to the truck, load, and driver sym- 
bols. Twenty-seven unique problems were con- 
structed in this manner, Each of these problems was 
displayed in each of the compression categories. One 
through nine were presented in F:; 10 through 18 in 
Fs; and 19 through 27 in Fs. Within a fact level 
three problems (1, 2, 3 for Fi; 10, 11, 12 for F2; and 
19, 20, 21 for Fs) were shown in black and white, 
three in color redundant, and three in color relevant 
coding conditions. Each S observed 27 displays, 1 
problem in each of the experimental conditions. 

The 27 problems were assigned to the 18 Ss in 6 
different orders, 3 Ss per order. 

Tn a preliminary session Ss were given a set of 
instructions which indicated the purpose of the ex- 
periment and the method of recording their observa- 
tions. A sample display was presented at this time 
so that the task and procedures were thoroughly 
understood before the experimental sessions began. 

The Ss were given an average of 5 minutes per 
load to complete each map; preliminary trials re- 
vealed that this time was adequate for the comple- 
tion of the tasks. Thus Ss were allowed 25 minutes 
for F, displays, 40 minutes for Fy displays, and 55. 
minutes for Fs displays. The time required to com- 
plete the 27 display tasks was 18 hours divided into 
6 experimental sessions lasting 3 hours. The experi- 
menter (£) began the first session with the follow- 
ing instructions: 


You are the acting traffic manager for the Acme 
Carrier Corporation, You will be given a map dis- 
play that will show you the current status of the 
loads, driver, and trucks, It is your duty to see 
that all loads are delivered, and that costs are kept 
to a minimum. Your ultimate goal is to create as 
large a profit as possible for your company and 
in turn a larger salary for yourself. 


At this time Z presented Ss with their first map dis- 
play accompanied by a set of shipping orders, which 
indicated the destinations of the loads for that 
display, and the answer form upon which they re- 
corded their transactions. The Ss were allowed to 

















hal? 
TABLE 1 
SuMMARY OF ANALYSIS OF VARIANCE FOR 
THREE DIsPLAY PARAMETERS 

Source df MS F 
Facts (A) 2 7,310 13:07%* 
Color (B) 2 663 1.19 
Compression (C) 2 37 <a 
AO XaC 4 edit 3.18* 
AXB 4 15,529 Zeon 
Bee 4 231 <i 
AXBXCE 8 70 <1 
Error 459 559 
*p <.05. 
*  < 001. 


refer to the instructions and to the economic fact 
sheet during the sessions. At the end of the time 
allotted for a display, the maps, shipping orders, and 
answer forms were collected and the next set of 
materials were distributed. The Ss were asked not 
to discuss their solutions at all during the course of 
the experiment. 

The Ss were not given knowledge of results until 
all problems were completed and the data were 
computer-analyzed. 


RESULTS AND DISCUSSION 


The final adjusted profit, total profit minus 
penalties due to violation of game rules, was 
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fact densities. (The higher the profit per load scores, 

the better the performance.) 
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the raw score for Ss on each display. Since the 
total profit and the time alloted for each dis- 
play was a function of the number of loads 
on that display, the raw score was normalized 
by dividing it by the number of loads on the 
display. Thus, performance on this task was 
measured by profit per load. 

Since preliminary analysis indicated that 
scores did not vary significantly with Ss nor 
with orders (F <1), the normalized data 
were cast into a three-factor design and an 
analysis of variance was performed. Table 1 
presents a summary of this analysis. 

Table 1 shows that only those variances as- 
sociated with fact density attained statistical 
significance. Figure 2 indicates that decision 
quality is linearly increasing as a function of 
increasing fact density. However, this rela- 
tionship is confounded by the significant in- 
teraction between fact density and color. 

Figure 3 suggests that decision quality is 
not a linearly increasing function of fact 
density when the use of color is considered. 
The New Duncan Multiple-Range Test (Dun- 
can, 1955) revealed that for the highest fact 
density relevant color coding produced the 


140 


EE FI 
lire 
EZ F3 


130 


120 


110 


100 


MEAN PROFIT PER LOAD 


90 





COLOR 
RELEVANT 


COLOR 
REDUNDANT 


USE OF COLOR 


MacaEE 5 
& WHITE 
u446-31 
Fic. 3. Mean profit per load scores for the inter- 
action of fact density and the use of color. 








VISUAL DispLAY EFFECTIVENESS 113 
TABLE 2 
SUMMARY OF RESULTS OF DUNCAN MULTIPLE-RANGE TEST 
(Cotor BY Fact DENsity INTERACTION) 
Shortest significant ranges 
k (2) (3) (4) (S) (6) (7) (8) (9) 
Tr 11.83 12.87 13.75 14.56 ys 16.02 16.67 17.29 
Results 
Interactions FCs FoC3 F.Ci F3C1 FiCo F3Ce2 FiCi F.Co F3:C3 
M* 88 98 101 102 103 113 117 119 133 











a Any two means not underscored by the same line are significantly different. 


are not significantly different. 


Any two means underscored by the same line 


b F1_3 refers to fact levels in increasing density. C1, C2, and Cs refer to-black and white, color redundant, and color relevant 


coding conditions, respectively. 


highest profit per load. However, for the in- 
termediate fact density color redundant pro- 
duced the highest scores, and for the lowest 
fact density black and white produced the 
best scores. All of these differences were sig- 
nificant at the 1% protection level. Table 2 
summarizes the results of the Multiple-Range 
Test. 

For all compression levels, the highest fact 
density produced better scores than either of 
the other two densities. It might be noted, 
therefore, that the interactions of compression 
and fact density is the primary contributor 
to the overall superiority of the high-density 
displays. 

The data indicate that the profit scores in 
this trucking game differ significantly as a 
function of structural differences in the visual 
displays. Furthermore, since these profit scores 
reflect the value of the decisions made by Ss 
on this task, we have isolated a decision 
quality metric that is a differential indicator 
of the effectiveness of visual displays. 

Because of the wide range of possible gam- 
ing situations, the application of this tech- 
nique can be very useful in many areas of 
human-factor study. One drawback to this 
technique is the difficulty of applying it to 
field situations—time and money are often 
prohibitive factors. However, if decision qual- 
ity measures can be correlated with tradi- 


tional measures of display effectiveness, a 
multiple-correlation model can be constructed. 
This model may then enable one to predict 
display effectiveness by the more easily meas- 
urable traditional parameters. The authors are 
currently investigating these relationships. 
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STIMULUS AND RESPONSE FIDELITY IN TEAM TRAINING’ 


GEORGE E. BRIGGS anp WILLIAM A. JOHNSTON 
Ohio State University 


Transfer performance of 2-man teams was observed in a simulated radar- 
controlled aerial intercept task following either high or low stimulus (S-) 
fidelity and either high or low response (R-) fidelity training treatments. Both 
high S- and high R-fidelity training treatments resulted in superior transfer 
task performance; however, the effects of high R-fidelity training were rela- 
tively brief. It was concluded that whereas both are desirable, it is less im- 
portant to provide high R-fidelity training at least for tasks where the major 
output requires verbal communication skills. 


In an earlier experiment on team training, 
Briggs and Naylor (1965) manipulated fidel- 
ity of the response (R) mode in a training 
task and found during transfer that superior 
performance occurred following the higher 
R-fidelity condition than under a lower R- 
fidelity training situation. However, this 
superiority was short-lived and transfer per- 
formance was comparable after the first of 
four transfer sessions. The present study 
represents an extension of the earlier work to 
include systematic manipulations of both 
stimulus (S-) and R-fidelity during training. 
Further, the basic task utilized here in both 
training and transfer required a higher level 
of interaction among team members than did 
the task employed in the earlier research; 
thus, we can test the generality of the pre- 
vious results on R-fidelity in a more demand- 
ing task. 


MeEtTHOD 


Transfer apparatus and procedure. The basic 
equipment used for transfer was the same as that 
described by Naylor and Briggs (1965): a special- 
purpose analog computer capable of generating and 
displaying simulated radar returns from target and 
interceptor aircraft on 14-inch CRTs. Experimenter 
assistants portrayed the interceptor pilots and exe- 
cuted heading and/or speed changes on target gen- 
erators as directed to do so over simulated radio 


1 This research was carried out in the Laboratory 
of Aviation Psychology and was supported by the 
United States Navy under Contract No. N61339- 
1327, sponsored by the United States Naval Train- 
ing Device Center, Port Washington, New York. 
Permission is granted for reproduction, translation, 
publication, use, and disposal in whole or in part for 
any purpose of the United States Government. 
James C. Naylor participated in the planning of this 
research. 
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channels by the team members. A team consisted 
of two radar controllers (RCs) each of whom 
viewed his own display and was responsible for two 
interceptor aircraft. There was a constant load of 
two targets per RC and the targets followed non- 
evasive straight-line courses at airspeeds of from 
350 to 600 knots. The interceptors were able to use 
speeds of from 200 to 1200 knots and could turn © 
at 3 degrees/second. 

The diameter of each display simulated 200 miles 
of airspace, and each display was marked into four 
quadrants as shown in Figure 1 which illusrates 
one of the initial conditions. Interceptors lh and Is 
were controlled by RC: while RC» directed Is and Ii. 
The goal was to have Il; intercept (come within 1 
mile) of target T, at the same time that Is inter- 
cepted T2; thus, RC: and RC: were required to 
coordinate interceptions. Likewise, Is and I, were to 
be coordinated one with the other (but not neces- 
sarily with I, and I.). In the earlier research (Briggs 
& Naylor, 1965) no coordination was required; thus, 
the present task was more demanding. Each target 
entered the airspace with one of 12 entry character- 
istics made up of four headings and three speeds. 
Pairs of targets entered symmetrically, as shown in 
Figure 1, and targets were reset following intercep- 
tion or after passing over a quadrant boundary. 

The instructions to RC stressed the need to achieve 
coordinated interceptions and the teams received 
summary feedback on this aspect of performance 
following each 35-minute session on the transfer task. 
There were four such transfer sessions. Verbal com- 
munications were permitted. between RCs over a 
channel other than those used to communicate with 
the pilots. An experimenter assistant observed the 
operation on a third simulated radar display and 
measured separations of pairs of targets and inter- 
ceptors following a successful intercept by one of 
the two interceptors, This provided a performance 
index to be called degree of coordination which was 
used in all analyses. Perfect performance (maximum 
coordination) would yield a separation of 1 mile for 
Ti and I, and 1 mile for T. and Ip of Figure 1; 
thus, the average degree of coordination would be 
1 mile for perfect performance. However, if Is was 
not perfectly coordinated with I,, then when I, and 
Ti were within 1 mile, I, and Tz might be sepa- 
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Fic. 1. A schematic representation of one of the 
nitial conditions used during transfer. (The Is and 
I's represent interceptor and target positions, respec- 
ively. In actual practice clock codes appeared at 
hese indicated positions and the dashed lines were 
10t present; they indicate here the future positions 
yf the targets.) 


‘ated by, say, 5 miles; this would yield an average 
legree of coordination score of 3 miles. 

Training conditions. There were four groups of 
eams in the experimental design representing the 
our possible combinations of two training variables. 
Zach training variable existing at two levels, high 
ind low S-fidelity and high and low R-fidelity, 
lefined as follows: 

1. High S-fidelity: In this condition the RCs each 
yiewed 5-inch CRT displays which were essentially 
dentical in all respects, except size, to the transfer 
ask displays and which were functionally related 
n the same way as the transfer displays to target 
senerator consoles manipulated by pilots under direc- 
ion from the RCs. Coordinated interceptions were 
equired during training Sessions 2 through 4. 

2. Low S-fidelity: A circular game board, identical 
o that employed by Briggs and Naylor (1965), 
vith 3,690 squares was used to represent the radar 
urveillance area. An experimenter moved four target 
heckers one square every 20 seconds while an 
xperimenter assistant (a pilot) moved the four 
nterceptors either one or two squares at the same 
ime interval as directed by the RCs. During train- 
ng Sessions 2-4 the goal, of course, was to obtain 
coordinated interceptions by landing on the same 
quare as that occupied by a target aircraft at the 
ame time the other RC obtained a similar blockage 
f the comparable target. The RCs observed the 
oard from an elevated platform. 

3. High R-fidelity: The RCs issued commands 
0 the pilots over a simulated radio channel identical 
o those used in the transfer task. Thus, they re- 
eived practice using verbal codes and phrases which 
vould be appropriate in the transfer task. 
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4. Low R-fidelity: Manual communication devices, 
which were connected to pilot display panels, were 
employed in this condition. The RCs issued com- 
mands by positioning heading and speed switches 
to the desired values, and thus while the pilots 
received the necessary information, the RCs had 
no opportunity to acquire the efficient verbal com- 
munication procedures which would be required in 
the transfer task. Factors such as time to give anc 
execute commands were held constant for the two 
levels of R-fidelity. Figure 2 indicates the training 
conditions for the four groups of teams. 

Subjects and training procedures. There were seven 
two-man teams per group, and pairings of RCs in 
a team were made nonsystematically. The RCs were 
students enrolled at the University who had not 
served in the previous research in this program and 
who answered a newspaper ad to earn $10 for eight 
evening sessions. 

The first training session involved a general expla- 
nation of the task requirements, practice in using 
the aircraft identification code to be experienced in 
the training and transfer tasks, and a 5-minute intro- 
ductory period on the training task. The latter 
involved one interceptor-target pair per RC and did 
not require coordinated hits. Training Sessions 2-4 
each lasted 35 minutes, and the RCs were required 
to obtain coordinated hits. Immediate feedback from 
the experimenter was provided following each inter- 
ception. No verbal communication was permitted 
between RCs during training. 


RESULTS 


Training. Within each level of S-fidelity the 
two levels of R-fidelity were compared in 
terms of the degree of coordination measure: 
none of these comparisons indicated statisti- 
cally significant differences in performance. 
Thus, the effects (see below) of R-fidelity 
on transfer performance cannot be explained 
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in terms of difficulty differences between high 
and low R-fidelity training task conditions. 
It was not possible to obtain comparable 
measurement units for the two S-fidelity con- 
ditions and so no statistical tests were made 
of these treatments. However, the low S- 
fidelity task clearly was easier than the high 
S-fidelity condition. 

Transfer, In the previous research (Briggs 
& Naylor, 1965; Naylor & Briggs, 1965), 
an efficiency score (fuel consumed per inter- 
cept) served as the major dependent vari- 
able. In the present study these data were 
recorded also but there were no statistically 
significant differences found except a signifi- 
cant practice effect, F (3, 72) = 14.64, p< 
.001. 

An analysis of variance was made of the 
degree of coordination data and this indicated 
three statistically significant sources: S- 
fidelity, F (1, 24 = 5.77, p < .01); a practice 
effect, F (3, 72) = 10.61, p< .001; and an 
interaction of Transfer Sessions X R-Fidelity, 
F (3, 72) = 4.21, p < .O1, ‘These resulis can 
be seen in Figure 2: on the average, Groups 
1 and 2 (high S-fidelity) achieved lower 
scores (better coordination) than did Groups 
3 and 4 (low S-fidelity); there is a general 
downward trend (improvement) for all groups 
which is particularly pronounced for Groups 
2 and 4; and the average of Groups 1 and 3 
(high R-fidelity) on the first transfer session 
is superior to that of Groups 2 and 4 (low 
R-fidelity) but no differences are apparent 
for the remaining three sessions, thus the 
interaction of R-fidelity with sessions. 


DISCUSSION 


It is clear that both high S- and high R- 
fidelity training conditions produced superior 
transfer task performance. However, the 
superiority of performance following high R- 
fidelity training was relatively short-lived, 
there being no statistically significant differ- 
ence between high and low R-fidelity treat- 
ments after the first transfer session. On the 
other hand, the superior effects of high S-fi- 
delity training were more extensive and longer 
lasting; therefore, one may conclude that 
while high fidelity training conditions are 
desirable for both input (S-fidelity) and out- 
put (R-fidelity) aspects of an operational 
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(transfer) task, it is relatively more impor- 
tant that high input fidelity characteristics 
be utilized, the detrimental effects of low 
fidelity output conditions being easily over- 
come upon transfer to the operational task. 

It must be cautioned that this conclusion 
should be restricted, at least tentatively, to 
those tasks in which verbal communication 
represents the primary output mode. Briggs 
and Wiener (1959), for example, show that 
in a complex tracking task, high fidelity of 
the response device during training was an 
important determinant of transfer perform- 
ance. It is not surprising that there may be 
more justification for high R-fidelity in a 
motor skill task than in the present task: 
we spend far more time in verbal communica- 
tion than in motor control operations; fur- 
ther, when necessary, the former can be 
utilized very effectively with little trouble 
adapting to environmental demands, while 
output utilizing the control device of a vehicle 
or other continuous processes requires rather 
precise human adjustment to the machine 
and other environmental dynamics. Thus, the 
relatively more extensive practice and the less 
precise environmental demands would permit ~ 
more ready transfer of verbal-communication 
skills than one might expect for motor-control 
skills, 

Table 1 indicates that the effect of R- 
fidelity on verbal communications between 
RC and pilot was of the same pattern as that 
noted above for the major dependent vari- 
able: degree of coordination. Apparently RCs 
trained on the manual communication device 
(low R-fidelity) found it more difficult than 
did the RCs trained in verbal communications 
to generate commands in the first transfer 
session, and it is logical, given the need for 
numerous rapid adjustments of interceptor 
speeds and headings, that this contributed to 
the significantly poorer performance of those 
teams at transfer in terms of coordination. 
An analysis of variance supports this observa- 
tion by indicating a significant interaction 
of Transfer Sessions X R-Fidelity for the 
communications data, F (3, 72) = 3.23, p 
ano 

The results on the significance of R-fidelity 
confirm the earlier findings by Briggs and 
Naylor (1965): high R-fidelity training does 
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TABLE 1 


THE Errect oF R-FIDELITY ON DEGREE OF COORDINATION AND NUMBER OF COMMANDS TO PILOTS 
AS A FUNCTION OF TRANSFER SESSIONS 





Measure R-fidelity 
Degree of coordination High 
Low 
Number of commands High 
Low 


Transfer session 


1 2 3 4 

2.98 2.69 2.44 2.31 

4.91 2.89 2.45 2.09 
127.36 137.22 147.79 162.72 
105.72 141.15 152.79 157.93 





oroduce superior transfer performance but the 
superiority is of rather short duration. Since a 
more demanding task was employed in the 
oresent research, the earlier results are both 
confirmed and extended. 

Finally, it is important to acknowledge 
that the efficiency index of performance was 
sensitive here to the two training variables 
sven though in the earlier research (Briggs 
& Naylor, 1965; Naylor & Briggs, 1965) this 
same dependent variable was sensitive to dif- 
ferential treatments. It is felt that the follow- 
ing observation can account for this incon- 
sistency: the efficiency score reflects the 
individual proficiency of team members more 
than it does the level of their “teamwork”; 
in the earlier research only a relatively low 
evel of such teamwork was required, the task 
demanding primarily individual proficiency; 
thus, the measure was responsive to experi- 
mental treatments that affected individual 
proficiency. However, the present task re- 


quires a rather high level of teamwork, and 
while individual proficiency is necessary, it 
obviously is not a sufficient condition for 
successful coordination. Therefore, it is not 
too surprising that a score which is sensitive 
to individual proficiency would not necessarily 
be sensitive to team proficiency in a task 
which emphasizes teamwork so greatly. 
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AN ASYMMETRICAL TRANSFER EFFECT IN RESEARCH 


ON KNOWLEDGE OF PERFORMANCE 


I. D. BROWN 1 
Medical Research Council, Applied Psychology Research Unit, Cambridge, England 


Gibbs and Brown (1955) reported that the motivational aspect of knowledge 
of results had a significant effect upon performance of a repetitive monotonous 
task, aside from its informative and rewarding aspects. In an experiment with 
12 Ss, output on document copying was 25% higher when it was displayed on 
a digital counter than when the counter was covered. Chapanis (1964) 
duplicated the main features of the experiment by testing 16 Ss on the task 
of punching teletype tape and found there was no significant advantage in 
displaying output. The present note demonstrates that the discrepancy between 
these findings results from a difference between the experimental designs used. 
The 2-way asymmetrical transfer effects produced by Gibbs and Brown’s 
design, in which Group I had condition K then NK, Group II had NK then K, 
show that knowledge of results may have a significant effect only when the 
task has previously been performed without it. The importance of other 


variables for future investigations of this topic are also briefly discussed. 


In an attempt to isolate and measure the 
motivational aspects of knowledge of perform- 
ance Gibbs and Brown (1955) tested 12 
subjects who served at the repetitive and 
monotonous task of document copying under 
two different conditions: (@) in which 
a digital counter displayed their output and 
(6) in which the counter was covered. Al- 
though the experimenters set no daily quota, 
did not supervise the work, ignored the re- 
corded output in the presence of the Ss and 
provided no monetary incentive, the mean 
number of documents copied during a 4-day 
period of 4 hours per day was 25% larger 
when the counter was visible. 

The practical importance of this finding 
persuaded Chapanis (1964) to duplicate the 
main features of the experiment. He tested 
16 Ss for 1 hour per day over a period of 
24 days on the task of punching tapes for a 
digital computer and was disappointed to 
find that the presence or absence of a coun- 
ter which tallied the S’s output had no sig- 
nificant effect upon the amount of work done. 

Chapanis offers a number of possible rea- 
sons for the discrepancy between his results 
and those of Gibbs and Brown, but appears 
to have missed the most probable explanation, 


1 Many thanks are due to E. C. Poulton and P. 
Freeman for allowing me to read their paper on 
asymmetrical transfer effects, and to P. Freeman for 
other statistical advice. 


which was suggested to the author by Poulton 
and Freeman’s paper * on transfer effects and 
experimental design. It is, therefore, the ob- 
ject of the present note to re-examine Gibbs 
and Brown’s original data in order to demon- 
strate that the discrepancy arises largely from 
the difference between the experimental de- 
signs which were adopted. 


TRANSFER EFFECTS AND EXPERIMENTAL 
DESIGN 


Gibbs and Brown used an AB, BA 
design in which half the Ss worked under the 
condition without knowledge of results (NK) 
first and then had the condition with knowl- 
edge of results (K) second, whereas the other 
Ss had these two conditions in the reverse 
order. Chapanis used an A~B design in 
which separate groups of Ss were tested in 
each of the two main conditions. (He does in 
fact mention this difference in design, but 
merely states that the use of each S as his 
own control ‘is generally a more sensitive 
type of experimental design than assigning 
Ss to independent groups.”) 

Poulton and Freeman draw on a wide vari- 
ety of experimental results to emphasize the 
extreme care which is necessary in analyzing 
data obtained by the use of the A—B, 

2K. C. Poulton and P. Freeman. On asymmetrical 


transfer effects with balanced experimental designs. 
Personal Communication, in preparation. 
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TRANSFER EFFECTS AND KNOWLEDGE OF PERFORMANCE 


TABLE 1 


Errect or ASYMMETRICAL TRANSFER UPON MEAN 
WEEKLY OvurpuT AND INTER-INDIVIDUAL 





DIFFERENCES 
Order of Knowledge No knowl- 
presen- of re- edge of re- 
tation sults (K) sults (NK) 
First 
M 5,904* 5,000* 
SD 1,261 1,349 
Second 
M 6,749** 4,521** 
SD 806 1,325 
Note.—WN = Six in each group. 
*p) = 0,242, 
** > = 0,021, 


B—A design. They point out that data from 
the condition which is performed second may 
be confounded by transfer effects. For exam- 
ple: a ‘two-way’ effect could lead to the situ- 
ation in which after A, B is better, whereas 
after B, A is worse, which would exaggerate 
the difference between the two conditions if 
A were more difficult than B, or reduce the 
true difference if A were easier. Poulton and 
Freeman therefore suggest that, although this 
simple balanced design allows a_ between- 
group and a within-group analysis of the data, 
an experimenter should resort to the latter 
with due caution only if the individual differ- 
ences are too large. 


RE-EXAMINATION OF GIBBS AND 
Brown’s DATA 


Without the benefit of this advice Gibbs 
and Brown performed an analysis of variance 
and, finding the ‘Conditions’ X ‘Order’ inter- 
action insignificant, pooled the data obtained 
from the first and second presentation of each 
condition. This concealed a marked asymmet- 
rical transfer effect which is shown clearly in 
the figure. After NK, K produced an insig- 
nificantly higher output than if given first, 
whereas after K, NK produced an insignifi- 
cantly lower output than if given first. Since 
NK tended to produce the lower output ini- 
tially, the difference between the conditions 
was exaggerated when they were given sec- 
ond. All Ss had a higher output with K than 
with NK, thus the overall effect of K was 
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found to be statistically significant. However, 
the present analysis shows that mean daily 
outputs in K and NK given first do not differ 
significantly (p>0.120, 1-tailed Mann- 
Whitney U tests, see Siegel, 1956 p. 116), 
nor do the mean weekly outputs shown in the 
table (p = 0.242, Mann-Whitney test). There 
is therefore no discrepancy between these re- 
sults and those obtained by Chapanis. Only 
when the two conditions were given second do 
the mean daily differences achieve signifi- 
cance (p < 0.021, except for the third day 
where p= 0.066), as do the mean weekly 
differences (p = 0.004). 

It would perhaps be unwise to conclude 
from this support for Chapanis’s negative re- 
sults that K in isolation has no motivating 
effect upon performance, because the insignifi- 
cant differences now found between K and 
NK presented first undoubtedly result in part 
from the general, but insignificant, increase in 
inter-individual variation shown in the table 
for condition NK. There is clearly a need for 
more research on this topic, using a much 
larger group of Ss. 

One important practical contribution made 
by the present analysis is that K apparently 
has a substantial effect upon output when 
given with a task which has previously been 
performed without it (see Figure 1). This 
would be the case in many industrial situ- 
ations, but prolonged testing would be neces- 
sary to determine whether any improvement 
in performance was permanent. It is possible 
that K has little or no effect upon the per- 
formance of a task which is begun in that 
condition, but affects performance only when 


N=6 IN EACH GROUP. 


HUNDREDS OF ITEMS COPIED. 





DAYS IN Ist WEEK. 


DAYS IN 2na WEEK. 


Fic. 1. Interaction of K and NK with order in 
which these Conditions were given. 
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K had previously been withheld. The experi- 
mental design of any future investigation 
should take this into account. 


REMAINING DISCREPANCIES BETWEEN 
THE Two EXPERIMENTS 


Chapanis reports a significant increase in 
output over the 24 days of his experimental 
period, but Gibbs and Brown report no such 
effects of learning. This difference is perhaps 
not surprising because of the dissimilarities 
between the tasks performed by the Ss in the 
two experiments; a multichoice, key-pressing 
task in the former case and a single pedal- 
pushing task in the latter. Clearly any task 
used should show no effects of learning if K 
is to be studied in isolation over a prolonged 
period. 

Chapanis also reports that his Ss did more 
work in the first 15 minutes of their daily 
work periods than in either of the two follow- 
ing 15 minutes spells and that this difference 
increased during the 24 days of the test. A 
similar analysis of Gibbs and Brown’s data 
shows that when their Ss were given K, out- 
put was about the same in each hourly spell 
of the 4-hour daily work period. But in condi- 
tion NK, output tended to be higher in the 
last 2 hours of the day, during the last 2 days 
of the 4-day working week. The effect was 
particularly pronounced, but not significantly 
so, when condition NK was presented first. 
This discrepancy between the two results may 
be related to the difference between lengths of 
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daily work period which were used; 1 hour 
by Chapanis and 4 hours by Gibbs and 
Brown. This variable also needs to be taken 
into account by future experimenters. 

It is clear that the two investigations dis- 
cussed here have merely laid the foundations 
for further work on the motivational effects 
of K in isolation. In any such work the proce- 
dure adopted by Chapanis is to be preferred: 
that is, to test each S individually on a task 
which he believes to be a real job of work and 
to prolong the work period for at least 3 
weeks. However, it would be preferable to 
adopt the method used by Gibbs and Brown 
in which Ss are then transferred to the alter- 
native condition and re-tested, in order to 
investigate any possible decrement in trans- 
fer effects over the second work period. 

Knowledge of performance may well have 
its main effect in decreasing inter- and intra- 
individual differences in output and this alone 
is of sufficient importance in the industrial 
application of the ‘information incentive’ to 
be worthy of further study. 
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COLLINEARITY IN THE EDWARDS PERSONAL 
PREFERENCE SCHEDULE* 
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The EPPS and other constant-sum tests cannot be used as prediction batteries 


for any multivariate statistical technique such as 


multiple-regression or 


canonical analysis unless the amount of collinearity in the set is reduced. 
Our analysis showed that the heterosexuality scale contributed more collinearity 
than the other 14 EPPS scales, with the abasement and nurturance scales 
2nd and 3rd. With these 3 scales deleted, the collinearity is reduced to ac- 
ceptable levels. The effects of deleting particular variables on collinearity in a 
given study can be determined experimentally by calculating the inverse of 
the correlation matrix and applying a simple formula utilizing the value of 
the diagonal element of the inverse matrix. 


In order to economize on space and time, a 
number of psychological instruments currently 
used in applied work are structured in such a 
way that scores on subscales of the instrument 
add up to a constant, or nearly constant, sum. 
The Edwards Personal Preference Schedule 
(EPPS) and the Allport-Vernon Scale of Val- 
ues are examples of such constant-sum instru- 
ments; all ranking procedures and most 
forced choice or forced distribution instru- 
ments also have this characteristic. This in- 
troduces a high-order linear dependency (or 
multicollinearity) among the scores that is not 
fully reflected in their simple intercorrelations 
(for instance, the highest intercorrelation 
among the 15 EPPS scales is about .46), and 
which therefore might escape the unwary re- 
searcher. In fact, the multiple correlation be- 
tween any one subscale and all the others in 
such a set is 1.00. When such a set of scores 
is used in its entirety as a prediction battery 
for multiple-regression or canonical analysis, 
the diagonal elements of the inverse of the 
correlation matrix (necessary for estimating 
regression coefficients) approach infinity and 


1 This investigation took place as a part of a study 
of the relationship between personality and con- 
sumer behavior which was financed by funds from 
the Ford Foundation administered by the Graduate 
School of Business, Stanford University. Computer 
time was financed in part by the Stanford Compu- 
tation Center under National Science Foundation 
Grant NSF-GP948. 


unreliable results may be obtained because the 
nonindependence among the predictors makes 
the estimates of coefficients unstable. This 
characteristic also creates problems (discussed 
below) for any program that attempts to esti- 
mate communalities for factor-analysis. 

Since the EPPS is among the more com- 
monly used personality instruments and is 
reasonably well validated for most scales (see 
Lodahl, 1964), it seemed useful to investigate 
methods of handling this constant-sum prob- 
lem and evaluate their usefulness for the 
EPPS. The results of this analysis are sum- 
marized in this paper. 


SAMPLE 


Data from Koponen’s (1957) sample 
(which were used by Edwards (1959) as part 
of his norming group) were made available to 
us by the Advertising Research Foundation.? 
The EPPS is constructed so that the sum of 
all scales should be 210, which provides a 
check on total scoring accuracy. For the anal- 
ysis reported here, all sets of scores not meet- 
ing this criterion were removed from the data. 
After this screening procedure, data remained 
on 2,248 male and 2,140 female adult Ameri- 
can subjects. For more information on this 
sample and the methods by which it was ob- 
tained, see Koponen (1957). 


2 This help, especially that of Charles K. Ramond, 
is gratefully acknowledged. 
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RESULTS AND DISCUSSION 


Since the multicollinearity problem in the 
EPPS is caused by the fact that all the scores 
for a respondent must add up to a fixed con- 
stant, it is reasonable to consider the effects 
of dropping 1 or more variables from the 
basic set of 15. As soon as any variable is 
removed from the set, the scores of the re- 
maining scales are no longer as constrained. 
As more variables are removed the number of 
degrees of freedom for the remainder of the 
set increases, further reducing the multicol- 
linearity among them. 

First, we wished to determine the extent to 
which each variable of the EPPS contributes 
to the multicollinearity among the remaining 
variables. This was achieved by deleting 1 
variable at a time, then calculating the mul- 
tiple correlation between each of the remain- 
ing variables and the other 13. Next, the 
squares of the resulting multiple correlations 
were averaged. These average values of R? 
are given in Table 1. In an approximate sense, 
they represent the average proportion of the 
combined variance of the 15 scales that is not 
accounted for by the deleted scale. Separate 


TABLE 1 


AVERAGE MULTIPLE CORRELATIONS FOR 14 SCALES OF 
THE EDWARDS PERSONAL PREFERENCE SCHEDULE, 
BY SCALE DELETED 


Average R? for 
14 scales, by 
scale deleted 





Scale deleted Men Women 
1. Achievement .633 604 
2. Deference .676 643 
3. Order 587 577 
4. Exhibition 655 629 
5. Autonomy .608 594 
6. Affiliation .670 642 
7. Intraception 579 col, 
8. Succorance 566 O07 
9. Dominance Oo 1 542 
10. Abasement 544 aoe 
11. Nurturance 621 607 
12. Change 546 O20 
13. Endurance 57/4 564 
14. Heterosexuality 356 382 
15. Aggression 602 591 


Note.—N = 2,248 male and 2,140 female subjects. 
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results for male and female respondents are 
presented, 

Table 1 shows that deletion of the hetero- 
sexuality scale reduces collinearity among the 
other scales by the greatest amount. This is 
true by a wide margin, for both men and 
women, even though an examination of the 
simple correlations among the scales would 
not suggest this conclusion. The effects of 
the other scales on collinearity are not as 
strongly differentiated. Next to heterosexual- 
ity, dominance reduces it by the greatest 
amount for men and abasement has the great- 
est effect for women. Deference has the least 
effect in both samples. 

The average values of R® given in Table 1 
are almost inversely proportional to the stand- 
ard deviations of the respective EPS scales, 
which were published by Koponen (1957). 
This result is a direct consequence of the 
constant sum characteristic of the test. It 
holds true for any constant sum battery, and 
so provides a quick way to evaluate the effect 
of any single variable in a test on total 
multicollinearity. 

If heterosexuality can be deleted from the 
set of EPPS variates for the purposes of a 
given study, the amount of collinearity is re- 
duced to about 35-40% of the average vari- 
ance of the remaining variables. Table 2 pre- 
sents the values of R® of each scale with the 
other 13 scales, with heterosexuality removed. 
Several of the scales are much more highly 
correlated with the remainder of the set than 
are others; this is particularly true for nur- 
turance, affiliation, dominance, and aggres- 
sion, Some of these multiple correlations seem 
high when we note that the simple correla- 
tions between pairs of scales that we would 
expect to be correlated are relatively small. 
The value of r° for dominance and succorance 
is only about .04, for example, while that for 
nurturance and affiliation is .20. 

The fact that certain scales have R? values 
of .4 or higher, even with heterosexuality re- 
moved, suggests that additional variables may 
have to be deleted in order for multicollinear- 
ity to be reduced to a point acceptable for 
regression or other types of multivariate anal- 


ysis. The effects were investigated by deleting 


abasement and nurturance, in turn, and cal- 
culating the squared multiple correlations be- 
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TABLE 2 


MULTIPLE CORRELATIONS BETWEEN Eacnw SCALE ON THE EDWARDS PERSONAL PREFERENCE 
SCHEDULE AND THE OTHER SCALES, WITH SELECTED SCALES DELETED: 
2248 MALE AND 2140 FEMALE SuBJECTS 


Heterosexuality 











Heterosexuality 
and abasement 


Heterosexuality, 
abasement, and 





deleted deleted nurturance deleted 

Scale Men Women Men Women Men Women 
1. Achievement roo2 329 247 nad 170 .150 
2. Deference 279 297 276 mh .260 264 
3. Order 378 387 339 340 295 304 
4. Exhibition 316 421 261 319 195 248 
5. Autonomy .240 254 .202 211 A77 174 
6. Affiliation 426 A409 390 365 349 350 
7. Intraception .243 302 .216 242 181 .206 
8. Succorance oc 419 314 363 310 5300 
9. Dominance 431 399 326 .262 264 .207 
10, Abasement 369 390 - — — — 
11. Nurturance 494. 421 A494 409 -—- — 
12. Change OG 438 .298 329 214 .262 
13. Endurance 326 398 315 306 303 342 
14, Heterosexuality _— _ — — — — 
15. Aggression 427 488 382 413 326 02 
M 356 382 Rol 316 2258 268 


tween each of the remaining scales and the 
others in the reduced set. The results are 
presented in the last four columns of Table 2. 
We can see that collinearity declines fairly 
slowly as additional variables are deleted, once 
heterosexuality is removed from the set. 
The question of what scale or scales should 
be left out of a statistical analysis depends in 
large part on the nature of the study. If 
heterosexuality is especially relevant to the 
subject matter of an analysis, for example, it 
would be a mistake to remove it on the basis 
of its effect on multicollinearity. It might be 
more desirable to delete several other scales 
instead of heterosexuality. On the other hand, 
Table 2 shows that it is not easy to anticipate 
the effects of removing a particular combina- 
tion of scales. It will usually be necessary to 
experiment with the data in order to find the 
proper tradeoff between the effects of multi- 
collinearity and loss of information through 
the deletion of more than one scale. (No in- 
formation is lost when only one scale is de- 
leted.) The experimental process is facilitated 
by using the following simple formula for de- 
termining the squared multiple correlation be- 


tween each member of a set of explanatory 
variables and the remaining variables in the 
set. 


1 
FAG apts = 1 aa Riv? 
where R" is the i‘” diagonal element of the 
inverse of the simple correlation matrix for 
the explanatory variable set. Since the inverse _ 
correlation matrix is printed out by many 
computer programs for multiple regression, it 
is possible to try different combinations of 
explanatory variables and observe both the 
regression results and the amount of collinear- 
ity present on the basis of a single set of com- 
puter runs, 

While the constant-sum battery creates 
most serious difficulties in multiple-regression 
analysis, it also leads to peculiar communality 
problems in factor analysis. Most factor anal- 
ysis programs begin with initial communality 
estimates based on the simple intercorrela- 
tions: the highest-in-row, or average for row 
are examples. But Guttman showed that the 
squared multiple correlation between one 
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variable and all the others is a lower bound for 
communality, and this, in the case of con- 
stant-sum batteries, will always be 1.00. For 
the EPPS, it is always possible to find an 
exact solution to the factor analysis problem, 
based on exactly 14 factors. Communality 
iterations based on 14 factors and starting 
values of one will converge in one iteration. 
But if theory requires that fewer than 14 
factors be extracted, we will want to ignore 
the fact that the SMCs are unity and proceed 
with communality iterations based on the 
usual starting values, not corrected by the 
SMC. 

Although the results reported here are based 
on only one test, the EPPS, the problem is the 
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same for any constant-sum instrument, and 
the methods for dealing with it presented here 
are generally applicable. 
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USE AND EVALUATION OF DISCRETE TEST 
INFORMATION IN DECISION MAKING 
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Elementary decision theory is applied to the problems of evaluating discrete 
tests or test items used to classify people into several categories, and choosing 
which of several treatments is best for persons falling within each response 
category. The technique explicitly considers the base rates of the various 
criterion groups and the relative seriousness of different types of errors of 
classification, as well as the proportion of each criterion group falling in each 


response category. 


Counselors, personnel managers, clinical psy- 
chologists, school curriculum planners all face, 
as a fundamental part of their professional 
activities, the use of psychological test infor- 
mation in making decisions about people. This 
paper discusses the interpretation and evalua- 
tion of discrete test items or other discrete 
pieces of information which are used for this 
purpose. 

Suppose, for example, that patients entering 
a mental hospital go through a preliminary 
screening process which identifies all classes 
of patients except two, say schizophrenics and 
schizoid neurotics. Past experience has shown 
that of this undiagnosed group, 40% are actu- 
ally schizophrenic, and 60% are neurotic. 
Two different treatments are available: drugs 
and psychotherapy. It has been found that 
drug therapy reduces the hospitalization of 
schizophrenics by an average of 24 months 
and that of neurotics by an average of 10 
months. Counseling, on the other hand, re- 
duces hospitalization of schizophrenics by 
only 4 months but shortens hospital stays for 
neurotics by 20 months. The problem is, then, 
to identify each of these people as belonging 
to one or the other of the criterion groups so 
that the appropriate treatment can be ap- 
plied to him. A dichotomous test item or bit 
of demographic information can be used in 
this identification if the response rates of the 
two criterion groups are known. Let us as- 


1 Although the junior author is a major in the 
United States Air Force, opinions expressed herein 
are solely the authors’ and in no way reflect those of 
the United States Air Force or the Department of 
Defense. 


sume, then, that past experience has shown 
that 20% of the neurotics and 60% of the 
schizophrenics respond “‘yes” to the item “I 
see visions.” The problem is now defined as 
selection of treatment based on item response. 

The important characteristics of this situa- 
tion are mutually exclusive criterion groups, 
discrete item responses, and discrete treat- 
ments. No one treatment is best for all cri- 
terion groups, or else all people could be given 
that treatment and there would be no need 
for a test. Although in the example there are 
two criterion groups, two responses, and two 
treatments, in general there can be any num- 
ber of any of these. There can be a certain 
number of criterion groups, a different num- 
ber of responses, and a still different number 
of treatments. If two or more treatments are 
sometimes given to the same person, then the 
various treatment combinations can be listed 
as separate treatments. Neither the criterion 
groups, the item responses nor the treatments 
need have even ordinal scale properties. 

The. present discussion also assumes that 
there is a “flexible quota,” that is, there are 
no restrictions on the number of people to be 
assigned to each treatment. A procedure simi- 
lar to the one described below is also possible 
for the fixed quota case, which is common in 
many personnel-selection situations, where the 
number of people to be hired is fixed in ad- 
vance. 

It is also assumed that the item is used as 
the sole predictor, since the value of an item 
used alone differs from the value of an item 
when used in conjunction with other pre- 
dictors. 
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We can specify several characteristics which 
a procedure for the use and evaluation of an 
item should have in the situation described 
above: (a) All members of the same response 
class should receive the same treatment. (0) 
In determining what treatment to give each 
response class, the relative frequencies (base 
rates) of the different criterion groups should 
be considered, since the larger any group in 
relation to another, the greater the probability 
that any given person is in it. (c) In deter- 
mining treatments, consideration should be 
given to the fact that some types of errors of 
treatment assignment are more serious than 
others. Thus, if in light of a person’s response, 
it is deemed probable that he is in a certain 
criterion group, but the treatment appropriate 
to that criterion group is disastrous for mem- 
bers of other criterion groups the subject may 
be in, then perhaps some other treatment 
should be chosen for him. (d) The more 
valid the item, the less important are base 
rates and the relative seriousness of different 
errors in determining treatment. In the ex- 
treme case, if all members of each criterion 
group give a response which no one else gives, 
then members of that response class can be 
assigned the treatment appropriate to that 
criterion group, without consideration of 
either base rates or the relative seriousness 
of errors. (e€) It should be realized that not 
every item capable of discriminating between 
two criterion groups is of value. Especially if 
one criterion group is much larger than the 
other, and if one type of error is far more 
serious than the other, it can happen that all 
response classes should receive the same treat- 
ment, despite the fact that the frequencies of 
the criterion groups differ from response class 
to response class. In this event, the item is 
of no value. (f) The item should be evaluated 
by comparing the results obtained using the 
item to the best possible results obtainable 
without the item. 

A decision theory approach has the desir- 
able properties just mentioned. Decision the- 
ory was first introduced in a major way into 
psychological testing by Cronbach and Gleser 
(1957). Decision theory had already reached 
a high state of development outside of psy- 
chology with such books as that by Wald 
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(1950). Articles by Edwards (1954) and 
Girshick (1954) emphasized applications in 
areas of psychology other than testing. The 
present discussion closely parallels one by 
Schlaifer (1959). Partly because Cronbach 
and Gleser dealt mostly with problems sub- 
stantially more complicated than the present 
one, decision theory has not achieved the wide 
range of attention it deserves from psycholo- 
gists concerned with the use, evaluation, and 
construction of tests. 

One of the major elements of decision the- 
ory which makes its application most appro- 
priate in this situation has received the most 
comment from critics of this approach. De- 
cision theory assumes that, since the benefits 
of appropriate classifications and the serious- 
ness of different types of misclassification are 
to be considered explicitly, some sort of nu- 
merical value must be assigned to each. Critics 
point out, quite correctly, that there is often 
no exact basis for assigning such values. How- 
ever, since it is clear that benefits of right 
decisions and losses from wrong ones should 
be taken into account in some fashion, it is 
probably better to estimate these values as 
nearly as possible rather than to ignore the 
matter completely. (A further discussion of 
this point is given in Cronbach and Gleser, 
Chiaiid9) 

The seriousness of errors is measured by 
evaluating the consequences, or utility, of 
each possible treatment for each possible cri- 
terion group. In the example given above, 
utility is measured in months of hospitaliza- 
tion. It can also be measiired in dollars or in 
any other convenient unit. These values can 
be entered in a Utility Matrix, showing the 
mean utility of each possible treatment for 
the members of each criterion group. The 
Utility Matrix for the present example is 
shown in Table 1(e). 

The fundamental objective is to choose 
treatments that will maximize the total utility 
for the group, which is the same as maximiz- 
ing the mean utility for the group, which is 
the same as maximizing the mean utility per 
person. The difference between the highest 
mean utility possible with the test and the 
highest mean utility possible without the test 
is taken as the mean value per person of the 
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est. Assume the information on base rates, 
‘esponse rates of criterion groups, and treat- 
nent utilities is available. This approach can 
ye summarized as follows: (a) For each re- 
sponse class, calculate the proportion of peo- 
dle in each criterion group. (0) From this in- 
ormation and the Utility Matrix, calculate 
he mean utility of each treatment for each 
esponse class. (c) For each response class, 
choose the treatment with the highest mean 
itility for that response class. This, of course, 
s the appropriate treatment to be selected for 
uture people who give that response. (d) 
Weight the highest mean utility for each re- 
sponse class by the proportion of the total 
group in that response class. Summing these 
veighted utilities gives the mean utility for 
he total group resulting from use of the test. 
€) For each possible treatment, calculate the 
nean utility resulting if that treatment is 
siven to everyone. The highest of these fig- 
ires is the highest mean utility that can be 
ichieved without use of any test. (f) Sub- 
racting the result of (e) from the result of 
d) gives the mean gain in utility resulting 
rom using the test. This is the value per per- 
on of the test. (g) Subtracting from the value 
yer person of the test the cost per person of 
idministering the test gives the net value per 
yerson of the test. 


COMPUTATIONAL RULES AND 
AN ILLUSTRATION 


In this section are given specific directions 
or the computations outlined above. Each 
tep is illustrated by computations for the ex- 
imple given earlier. The rules are stated in a 
orm appropriate for any number of response 
lasses, criterion groups, and treatments. As 
yreviously stated, there is no need for the 
umber of response classes to equal the num- 
yer Of criterion groups, or for either to equal 
he number of treatments. 

The “givens” of the situation, that is, the 
yase rates, the relative response frequencies, 
ind the Utility Matrix, are shown in Table 
(a), (6), and (e). The entries in all other 
ables are calculated from these three tables. 

1, Enter in each cell of Table 1(c) the 
oduct of the elements in the corresponding 
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EXAMPLE OF THE USE AND EVALUATION OF A DicHotomous TrEst ITEM 
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cell of Table 1() and the corresponding row 
of Table 1(a). 


AX 4=.16 
4X 6 = .24 
6 X 8 = .48 
6 X 2 = .12 


Each element in Table 1(c) is the propor- 
tion of the total group which both belongs to 
a specific criterion group and gives a particu- 
lar response. 

2. Take column totals of Table 1(c). 


.16 + .48 = .64 
24 + 12 = .36 


Each column total is the proportion of the 
total group giving a particular response. (The 
sum of these column totals should be unity.) 

3. Enter in each cell of Table 1(d) the 
corresponding element in Table 1(c) divided 
by its column total. 


16/.64 = .25 
48/.64 = .75 
24/36 = .67 
12/.36 = .33 


Each element in Table 1(d) is the propor- 
tion of a response class in a particular cri- 
terion group. (Each column in Table 1(d) 
adds to unity.) 

4. Each cell in Table 1(f) is identified by 
a response class and a treatment. There is a 
column in Table 1(d) corresponding to that 
response class, and a column in Table 1(e) 
corresponding to that treatment. 

Enter in each cell of Table 1(f) the sum of 
the products of elements in the corresponding 
column of Table 1(d) and the corresponding 
column of Table 1(¢). For example, in finding 
the upper left cell of Table 1(f), .25 of the 
“no” sayers are schizophrenics (Table 1 d) 
and drug therapy reduces their hospital stay 
by 24 months (Table 1 e), while .75 of the 
“no” sayers are neurotics, for whom drug 
therapy reduces hospitalization by 10 months. 
Therefore, .25 X 24:-- .75°X_ 10 =43%5 "the 
mean utility for applying drug therapy to 
“no” sayers. Similarly: 


29% 4+.75 X 20= 16 
67 X 24 + 33 X 10 = 19.33 
67 X 4+ .33 X 20= 9.33 
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Each element in Table 1(f) is then the 
mean utility resulting from giving a particular 
treatment to a particular response class. 

5. Underline the highest element in each 
column of Table 1(f). The underlined element 
is the mean utility of the best possible treat- 
ment for that response class. 

6. Enter in the upper cell of Table 1(4) 
the sum of the products of the underlined ele- 
ments in Table 1(f) and the corresponding 
column totals in Table 1(c). 


64 X 16 + .36 X 19.33 =seZ0 


This entry is the mean utility associated 
with using the test. It is not a direct measure 
of the value of the test. 

7. Enter in each cell in Table 1(g) the sum 
of products of the elements in Table 1(a) and 
the corresponding elements in a given column 
of Table 1(e). 


AX 24+ .6 X 10 = 15.60 
AX 4+.6 X 20 = 13.60 


Each element in Table 1(g) is the mean 
utility associated with administering the cor- 
responding treatment to all persons. 

8. Enter in the second cell of Table 1(/) 
the highest element from Table 1(g). This 
entry is the highest mean utility possible with- 
out using the test. 

9. Enter in the third cell of Table 1(%) the 
difference between the first two entries in 
Table 1(h). 

This entry is the per-person gain in utility 
resulting from use of the test, expressed in the 
same units as the Utility Matrix. It is the per- 
person value of the test. 

10. If the cost of giving the test can be ex- 
pressed in the same units as the last entry, 
enter in the fourth cell of Table 1(/) the per- 
person cost of giving the test. Enter in the 
fifth cell of Table 1(/) the difference between 
this figure and the previous entry. (Not done 
in example.) The last entry is the net value 
per person of the test. 


USES OF THE TECHNIQUE 


The technique can be used either as a guide 
to the use and evaluation of single items, or 
it can be the basis of a test-construction tech- 
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nique, in which the most valuable items are 
selected for a test. 

This technique can also be used to illus- 
trate several important lessons for test users. 
For example, when there are three or more 
treatments, it can be that the best treatment 
for one or more response classes is not neces- 
sarily the best treatment for any single cri- 
terion group. 

Another important lesson that can be illus- 
trated is the adverse result of overestimating 
test validity. Suppose the test user thinks that 
the distribution of item responses of each cri- 
terion group is as in the above example so he 
proceeds through the calculations shown and 
concludes that drugs should be given to those 
who respond “yes” and counseling to those 
who respond “no.” However, suppose that he 
has overestimated the discriminating power of 
the item, and in reality .5 of the schizophren- 
ics and .4 of the neurotics respond “yes.” Re- 
peating the calculations after entering these 
figures in Table 1(b) shows that the best pro- 
cedure is to administer drugs to both response 
classes, so that no test is needed at all. Over- 
estimating the discriminating power of the 
item therefore resulted in assigning the wrong 
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treatment to the “no” response class, result- 
ing in a lower total utility than would have 
been achieved by not using the item at all. 

Errors in estimating criterion group base 
rates or treatment utilities can also have dra- 
matic effects. The accuracy of this technique 
depends, then, on accurate statements of the 
three “givens”: base rates, response rates, and 
utilities. Its principal value lies in the fact 
that the process by which these “givens” are 
considered is made lawful and regular and 
thus does not contribute to or confound error 
in decision-making. 
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CHANGES IN ATTITUDES OF LEARNERS WHEN PRO- 
GRAMED INSTRUCTION IS INTERPOLATED BE- 
TWEEN TWO CONVENTIONAL INSTRUCTION 
EXPERIENCES * 
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The purpose of this study was to determine the nature of changes in student 
attitudes when programed instruction is interpolated between conventional 
instruction experiences. 5 parallel forms of a 26-item Likert type attitude 
scale were administered in counterbalanced order to 70 airmen studying 
radiation detection and 53 airmen studying camera repair at Lowry Air Force 
Base. Both courses included several weeks of lecture, a programed unit, and 
several more weeks of conventional instruction. Students’ attitudes were sig- 
nificantly more favorable during the programed unit in both courses. Changes 
were considerably more pronounced for the 17 highest ability students. 


The research literature contains a_ sub- 
stantial amount of evidence demonstrating 
the effectiveness of programed instruction for 
achieving various types of objectives in many 
subject matter areas (Goldstein & Gotkin, 
1962; Hughes & McNamara, 1961; Lums- 
daine & Glaser, 1960; Morrill, 1961; Silver- 
man, 1960, 1962). Considerable evidence is 
also available indicating that, in general, stu- 
dent reaction to programed instruction is 
favorable (Feldhusen, 1962; Fry, 1963; 
Holland & Skinner, 1961; Klaus, 1961; 
Klaus & Deterline, 1962; Lysaught, 1961). 
Relatively few studies have been conducted, 
however, in which differential student reac- 
tions to programed instruction and conven- 
tional instruction have been assessed. It was 
the purpose of the study reported here to 
determine the nature of changes in student 
attitudes when programed instruction is in- 
terpolated between conventional instruction 
experiences, 


DESIGN FOR THE STUDY 


The research upon which this report is 
based was conducted at Lowry Air Force Base, 
Denver, Colorado. Five parallel forms of a 


1The research reported herein was one of several 
projects performed pursuant to a contract with the 
Office of Education, United States Department of 
Health, Education, and Welfare. 

The authors wish to express their appreciation to 
Stephen A. Church, USAF, Chief, Lowry IMD Team, 
for his cooperation in the study. 
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26-item attitude scale measuring attitude 
toward method of instruction, expectation 
fulfillment, and course content were ad- 
ministered in counterbalanced order to 123 
airmen enrolled in two combination lecture 
and programed instruction courses. The two 
courses, Radiation Detection and Camera 
Repair, were arranged such that students 
experienced several weeks of conventional in- 
struction, then a programed unit, then con- 
ventional instruction for the remainder of the 
course. 


RADIATION DETECTION AND CAMERA 
REPAIR PROGRAM 


The programs involved in the two courses 
were linear programs prepared by Air Force 
personnel. The Radiation Detection program 
consisted of 7 units and 600 frames and the 
Camera Repair program consisted of 8 units 
and 900 frames. Time required to complete 
each of the programs varied from 8 to 20 
hours. The airmen in both courses completed 
the programs according to their own rate. 
When the airmen finished the program, they 
were free to read in a technical library or 
review material covered previously. 


ADMINISTRATION PROCEDURE AND SUBJECTS 


The airmen who participated in this study 
included all those enrolled in both courses 
from October, 1963, to March, 1964, and all 
those enrolled in the Camera Repair course 
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from April to June, 1964. New classes started 
the 15-week courses on a_ biweekly basis. 
Class enrollments ranged from 5 to 12 with 
a median class size of 6. Because of changes 
in schedule and incomplete data, airmen in 
6 classes were not included in the final 
analyses. 

Scales used in the first, second, and fifth 
administrations were administered to the 
classes in group situations. Since scales used in 
the third and fourth administrations were to 
be completed halfway through and at the end 
of the self-paced programs, the airmen were 
given addressed, stamped envelopes and were 
asked to complete the scales at the specified 
points. Each airman was responsible for mail- 
ing his own completed scales to Colorado 
State University. 

At the time of the first administration, a 
brief description of the proposed procedure 
was read to the group by one of the authors. 
The airmen were told only that the study 
was one of a series of projects designed to 
assess changes in attitude during learning 
situations. No reference was made to con- 
trasting instructional techniques. Enrollees 
were assured that participation in the project 
was voluntary and that their individual 
responses would not be revealed. Throughout 
the project, cooperation was excellent and 
none of the airmen refused to participate. 

The airmen had been chosen for these 
courses according to military needs, test 
scores, and interest in the subject matter. 
Almost all were high school graduates and 
about 20% had attended college, although 
relatively few had received degrees. 


DESCRIPTION OF THE ATTITUDE SCALES 


Two approaches were followed in construct- 
ing the attitude scales used in the present 
study. First, a review of current literature on 
“new media” learning experiments was under- 
taken to identify factors related to student 
attitudes toward new media. Second, struc- 
tured interviews were held with 18 students 
who had recently completed a programed 
learning experience. 

From these two sources, over 500 at- 
titudinal statements were compiled and clas- 
sified according to the three areas: attitude 
toward method of instruction, attitude toward 
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expectation fulfillment, and attitude toward 
course content. To identify ambiguous state- 
ments as well as to assure coverage of the 
favorability-unfavorability continuum, — the 
statements were sorted according to the 
Thurstone technique of “Equal Appearing 
Intervals” (1929). Using the median scale 
values and indices of ambiguity as criteria, 
26 items were selected for the final scale. 

Of the 26 statements included in the final 
scale, 13 were favorable or positive, and 13 
were unfavorable or negative. Sixteen state- 
ments pertained to the area of method, 5 to 
expectation, and 5 to content. 

Sample statements representing each of the 
three areas are as follows: “I am satisfied 
with the methods used for teaching this class” 
(method); “This class exceeds my highest 
expectation” (expectation) ; and ‘“‘The subject 
matter of this class is interesting” (content). 

Since the scales were to be administered in 
close proximity, equivalent forms of the 
original 26-item scale were constructed by 
rewriting each statement 4 different ways. 
Care was taken not to change the content or 
nature of each original statement by maintain- 
ing key words throughout the five forms. The 
five forms were printed according to the 
Likert (1932) format using a constant re- 
sponse scale of strongly agree through strongly 
disagree. Scales were scored 4, 3, 2, 1, and 0 
for strongly favorable through strongly un- 
favorable responses. 


RESULTS 


Split-half reliability coefficients were com- 
puted for a random sample of 75 attitude 
scales from the first administration. The 
reliability sample included all 5 forms of the 
scale and was based on students from both 
courses. Reliability coefficients, as estimated 
by the Spearman-Brown formula, were: .78 
for method, .74 for content, .46 for expecta- 
tion fulfillment, and .85 for total. 

Coefficients of correlation computed be- 
tween adjacent administrations for both 
courses are shown in Table 1. From inspection 
of the coefficients in Table 1, it is apparent 
that there was greater correlation between 
adjacent administrations within the same in- 
structional condition than between conditions. 
The coefficients within instructional condi- 
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TABLE 1 


COEFFICIENTS OF CORRELATION BETWEEN ADJACENT ADMINISTRATIONS—RADIATION 


DETECTION AND CAMERA REPAIR 





Administrations 








Area Class N 1&2 2 &3 3&4 4&5 
Method Radiation Detection 70 Bol —.01 81 — 06 
Camera Repair 53 .67 ELE A8 —.06 

Expectation Radiation Detection 70 46 09 18 .08 
Camera Repair 53 .62 40 09 .26 

Content Radiation Detection 70 Al Di 81 io 
Camera Repair 53 .63 pO 96 .28 

Total Radiation Detection 70 nD —.05 84 04 
Camera Repair 53 al roi 56 .04 





tions, which ranged from .31 to .84, can be 
considered additional estimates of reliability. 
Although the reliability estimates of the scales 
used in this investigation are not high, they 
can be considered satisfactory for group com- 
parisons. 

In Table 2 are shown the mean attitude 
scale scores tabulated by administration and 
area for both courses. To determine the sig- 
nificance of the differences among the five 
means within each area, the analysis of 
variance was applied to the data. Since these 
data are repeated administrations of attitude 
scales to the same subjects, the source of 
variation attributable to “individuals” was 
isolated and tested separately. In all instances, 


this source of variation was found to be sig- 
nificant, indicating that correlation existed 
among the scores obtained for the individual 
students. The significance of the “between 
administrations” source of variation is shown 
in the column of F values in Table 2. Sig- 
nificant changes in attitude scores resulted in 
all but two areas and these were both in the 
Camera Repair course. 

Inspection of the means in Table 2 reveals 
that a similar pattern for both area and total 
scores resulted from the repeated administra- 
tion of the scale although the Camera Repair 
means generally were lower than those for 
Radiation Detection. Except for the first 
administration Camera Repair content mean 


TABLE 2 


Mean Artirupr Scores By AREA AND BY ADMINISTRATION—RADIATION 
DETECTION AND CAMERA REPAIR 











Administration 
Area Class N i 2 3 4 5 PF 
Method Radiation Detection 70 40.6 41.3 43.3 44.7 39.9 one 
Camera Repair 53 40.4. OTe 39.0 41.6 36.4 ong ie 
Expectation Radiation Detection 70 13.7 1977, 14.0 14.3 12.6 8.50** 
Camera Reapir 53 13.2 13.0 13.2 eS 122 1.14 
Content Radiation Detection 70 14.7 14.7 15.2 15:3 14.1 S62" 
Camera Repair 53 14.1 1331 132 13.6 12.6 2.37 
Total Radiation Detection 70 69.1 68.8 72.6 74.3 66.6 4.07** 
Camera Repair 53 67.8 63.8 65.4 68.2 61.2 Sos 





* Between administrations F significant at the .05 level. 
** Between administrations F significant at the .01 level. 


ATTITUDE CHANGES IN PROGRAMED INSTRUCTION 


score, the most favorable attitude in all other 
instances was reflected at the end of the 
programed unit (fourth administration). In 
every instance the lowest mean occurred 2 
weeks after returning to conventional instruc- 
tion (fifth administration). 

Within the programed unit (third and 
fourth administrations), the fourth ad- 
ministration mean was higher than that for 
the third administration. This suggests that 
any Hawthorne or novelty effect was either 
still operating at the end of the unit or did not 
occur in the study. 

The nature of the changes during the two 
learning experiences is portrayed graphically 
in Figure 1. Here the means for the total scale 
scores have been plotted for both courses. 
The increase in favorability of attitudes during 
the programed unit and the rapid decrease in 
favorability of attitudes after this unit are 
especially apparent. 

To determine the statistical significance 
of changes in attitude during various parts of 
the courses, the significance of the difference 
between adjacent means was computed and is 
shown in Table 3. Inspection of this table 
indicates that a significant decrease in favor- 
ability of attitudes occurred for each area 
and the total scale in both courses between the 
fourth and fifth administrations. It can also 
be noted that in the Radiation Detection 
course, a significant increase in favorability 
of attitudes occurred between the second and 
third administrations in all areas and the 
total score. 


Score 


























































































































































































































a 
AVUmMin ies Ses ws a tet coun 


N=70 
N=53 


Radiation Detection 
Camera Repair 


Fic. 1. Mean attitude scores by administration for 
both courses. 


Discussion 


The results of this study clearly indicate 
that pronounced changes in student attitudes 
are associated with programed instruction 


TABLE 3 


SIGNIFICANCE OF DIFFERENCES BETWEEN ADJACENT ADMINISTRATIONS FOR RADIATION 
DETECTION AND CAMERA REPATR COURSES 











Administrations 
Area Course 1&2 2&3 3&4 4&5 
Method Rad Det (N = 70) .05 level .01 level 
Cam Rep (V = 53) .05 level .05 level .01 level 
Expectation Rad Det .O1 level .01 level .O1 level 
Cam Rep .05 level 
Content Rad Det .05 level .01 level 
Cam Rep .O1 level .O1 level 
Total Rad Det .01 level .01 level - 
Cam Rep .05 level .01 level 
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interpolated between conventional instruction 
experiences. Students reflected more favorable 
attitudes toward method and course content 
and experienced greater expectation fulfill- 
ment. Several possible explanations of these 
findings warrant exploration. Among them are 
the following: 

Explanation Number One. The more ap- 
pealing aspects of the subject matter were 
selected for the program content. The two 
programs for the courses involved in this 
study were prepared after a detailed analysis 
of course objectives and content. Those topics 
included in the program were selected which 
required a sequence of responses from the 
students but which were neither more difficult 
nor more interesting than other topics. In 
fact, the Camera Repair programed unit con- 
tained a subunit on gyroscopes—material con- 
sidered prerequisite to, but not an integral 
part of, the last portion of the course. 

Some evidence reflecting the appeal of the 
subject matter of the programed units can be 
found in the attitude toward content mean 
scores shown in Table 3. To the extent that 
the content is inherently more appealing than 
the rest of the course, a marked increase in 
attitude toward content score should be found 
between administrations two and three. In 
the Camera Repair course, the content score dif- 
ference between administrations two and three 
was not significant. In the Radiation Detection 
course, the difference was significant at the 
.05 level. The significance of this difference, 
however, was equalled by the significance of 
the parallel difference for method mean scores 
and was surpassed by the significance of the 
difference (.01) for expectation fulfillment 
mean scores. Thus, the available evidence does 
not support the explanation to content of the 
programed unit. 

Explanation Number Two. Whereas the 
material prior to and during the programed 
unit was appealing to students, the content 
of the unit immediately following the program 
was unappealing. This explanation is ob- 
viously related to Number 1 and can be par- 
tially evaluated from the same source of 
evidence, Table 3, but from administrations 
four and five. Whereas attitude toward con- 
tent mean scores did decline significantly (.01 
level) between the fourth and fifth administra- 
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tions for both courses, so also did the mean 
scores for method (both courses), for ex- 
pectation (Radiation Detection), and for 
total (both courses). There is no clear-cut 
evidence against the appeal of the last por- 
tions of the courses. Although slightly more 
evidence in support of this explanation can 
be found than that which supports Explana- 
tion 1, such evidence can hardly be considered 
substantial. 

Explanation Number Three. The fifth ad- 
ministration of the scales may reflect a de- 
crease in mean scores typically found from 
repeated administrations of attitude scales to 
the same subjects. Whereas it is true that fre- 
quent decreases in attitude scales scores re- 
sult when such scales are administered more 
than once, it would be expected that the 
decrease would appear consistently within 
each experimental condition for each pair of 
adjacent administrations. Inspection of Table 
2 reveals that this is not borne out, since the 
second obtained during the programed unit is 
larger in all cases than the first mean (third 
administration). It should be noted, however, 
that this explanation cannot be tested ade- 
quately from the design of this study. A 
design such as that proposed by Solomon 
(1949) in which part of the subjects do not 
respond to any scale until the end of the 
experimental period would be required for 
adequate evaluation of this explanation. 

Explanation Number Four. Fluctuation in 
attitude scores during the experimental period 
is primarily the result of responses made by 
some unique subgroup of students. To evaluate 
this explanation requires detailed examina- 
tion of the characteristics of students com- 
prising the total sample. Data relating to 
the general mental ability and to previous 
achievement of the students in the Radiation 
Detection course were available for this type 
of analysis and were analyzed as follows: 
Students were ranked according to the Airman 
Qualifying Examination General Aptitude 
Index and the highest and the lowest 25% 
of the distribution were indentified. These 
groups were labeled “high ability” and “low 
ability.” The mean attitude scores for the six 
subgroups were computed by administration 
and by area and are shown in Table 4. The 
same procedure was repeated for measures of 
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TABLE 4 


Mean ArtirupE Scores ror RADIATION DETECTION 
SUBGROUPS BY AREA AND ADMINISTRATION 














Administration 
Area Subgroup 1 2 3 4 5 
Method Over achievers 41.3 42.1 42.2 44.6 41.8 
Under achievers 41.2 44.2 42.7 45.4 41.5 
High ability 39.1 39.1 44.8 46.4 35.3 
Low ability 38.0 41.1 41.5 42.7 40.8 
High grades 42.3 38.3 45.5 46.4 36.4 
Low grades 40.3 43.9 42.1 45.5 42.5 
Expectation Over achievers 13:9 03:01 13.9) 14.08) 12:9 
Under achievers 13.8 13.8 13.7 14.6 13.3 
High ability 431d Se 5-4. 15220 10:8 
Low ability 14.0 12.8 14.1 14.1 13.1 
High grades 17.90 F10:5)) 14-4 014.7) 10:7 
Low grades 13.3.5 13,9) > 13/6 14.19) 13:8 
Content Over achievers 15.8 914,55) 15.6, 15:4 1356 
Under achievers 14.5 15.6 15.1 15.3 15.0 
High ability 14.8 14.6 15.6 15.9 13.8 
Low ability 13.9 14.8 14.7 14.8 13.7 
High grades 15:4 3-41 5. 9ee 5,8 13.0 
Low grades 13.7 149 14.9 15.4 14.5 
Total Over achievers 71.0 69.6 71.5 74.9 68.2 
Under achievers 69.5 73.6 71.5 75.8 69.8 
High ability 66:9 0105.2. Osa mideon 59.8 
Low ability 65.0 68.8 70.3 71.6 67.6 
High grades 70.6 62.5 75.8 76.9 60.1 
Low grades Of, 22, OSs eto. 10:8 





Note.—N = 17 for each subgroup. 


previous achievement, but the two groups 
(high and low 25%) were designated “high 
grades” and “low grades.” To obtain a meas- 
ure based upon both achievement and ability, 
the ability rank was subtracted algebraically 
from the achievement rank. From this dis- 
tribution of differences, the “over achievers” 
(upper 25%) and the “under achievers” 
(lower 25%) were identified. Since the means 
in Table 4 were obtained from “mutilated” 
distributions and were computed for descrip- 
tion only, no tests of significance were made 
for the differences among them. 

Detailed inspection of Table 4 indicates that 
differences between the subgroup means within 
administrations are pronounced for two char- 
acteristics—grades and ability. Both the high 
grade subgroup and the high ability subgroup 
reflected similar patterns except that the high 
grade subgroup showed slightly more favorable 
initial attitudes. As would be expected the 
membership of these two subgroups over- 
lapped. Twelve of the 17 students were mem- 
bers of both groups. Nine of the members 
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of the low ability low grades subgroups were 
members of both subgroups. Since the over- 
under achievement subgroups consisted of 
both high and low ability students and high 
and low grade students in various combina- 
tions, their means within administrations 
tended to fall between those of the groups 
defined according to grades and ability. 

To portray graphically the nature of the 
changes reflected by the high and low ability 
subgroups, the mean total attitude scores for 
these two subgroups were plotted in Figure 2. 
Here it can be seen that the attitude changes 
of the high ability subgroup were clearly 
more pronounced than those for the low 
ability subgroup or the total group of 70 
students. From this analysis it can be inferred 
that changes in attitude during a learning 
experience consisting of both programed and 
conventional instruction are associated with 
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Fic. 2. Mean total attitude scores for high- and low- 
ability subgroups in radiation detection. 
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specific characteristics of the learners, par- 
ticularly ability and achievement. 


IMPLICATIONS 


The major implications of the present 
study relate to the impact on the learner of 
the transition from one instructional method 
to another when more than one method is 
involved in a learning experience. The present 
study was not designed to evaluate procedures 
for effecting a smooth transition. It seems 
reasonable to postulate, however, that steps 
might be taken to avoid the pronounced de- 
crease in favorability of attitudes which oc- 
curred in this investigation. Although further 
research will be necessary to assess the effec- 
tiveness of each of the following practices 
for contributing toward a smooth transition 
from programed instruction to conventional 
instruction, the following suggestions may 
offer promise. 

1. Make the change gradual rather than 
abrupt. In the present study the students 
changed abruptly from one method to the 
other. It is conceivable that this procedure 
emphasized inherent differences between the 
two methods. It is suggested that the pro- 
cedure of overlapping the two methods in 
varying degrees be evaluated. 

2. Arrange to have a unit especially chal- 
lenging to high ability students immediately 
following the programed instruction. Since the 
decrease in favorable attitudes was most 
pronounced for high ability and high achieving 
students, a special effort to provide satisfac- 
tion for these subgroups might be made 
through rearranging the content of a course. 
Although the influence on low ability students 
of such a procedure would need to be evalu- 
ated also, this procedure might contribute 
toward avoiding the severe decrease in overall 
group attitude. 

3. Prepare students for the change by dis- 
cussing the differences between methods to 
be utilized in the learning experience. It can 
be hypothesized that attitudes toward a learn- 
ing experience are a function of the expecta- 
tion of the individual. The greater the dis- 
crepancy between his expectation and what 
he actually experiences, the less favorable 
will be his attitude. It should be possible for 
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an instructor to assist students in establish- 
ing realistic expectations by discussing the 
strengths and limitations of each method of 
instruction used in a particular learning 
experience. 

4. Emphasize those learning aspects during 
conventional instruction which students pre- 
fer in programed instruction. Frequently in 
new media investigations the conventional 
aspects of instruction are ignored. It is en- 
tirely possible in experimental learning situa- 
tions that the conventional instruction suffers 
indirectly as a result of the new medium 
being evaluated. It is strongly suggested here 
that such aspects of learning as attention to 
individual differences, appropriate pacing, 
immediate evaluation of responses and con- 
stant attention to motivation be emphasized 
in conventional instruction methodology. 


CONCLUSIONS 


Findings from present investigation support 
the following conclusions: 

Programed instruction interpolated between 
conventional experiences is associated with 
pronounced changes in student attitudes. 

Students tend to reflect more favorable at- 
titudes during programed instruction than 
during conventional instruction. 

Differential attitudes during programed and 
conventional instruction are especially critical 
for high ability students. 

Special attention to student attitudes by 
the instructor is warranted during the transi- 
tion from programed instruction to conven- 
tional instruction. 
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A PRELIMINARY STUDY OF A TEST FOR AIR 
TRAFFIC CONTROLLERS 


LOUIS D. HARTSON 


Oberlin College 


A test, employing the analogies format, was constructed from diagrams rep- 
resenting jet aircraft on a radar scope. From the verbalized reactions to the 
problems presented by the test, of the air flight controllers who acted as Ss, 
sketches were prepared describing each S’s attitudes and methods of handling 
the potential confrontations indicated. When these sketches were read to 3 
members of the training staff of the Oberlin FAA Center each judge made a 


perfect score in identifying the Ss. 


During the year, 1963-64, the author was 
employed in supervising the administration of 
tests, devised by Carl A. Silver, of the Frank- 
lin Institute, for a project initiated by the 
United States Navy. The subjects (Ss) were 
controllers on the staff of the Oberlin Air 
Route Traffic Control Center, of the Federal 
Aviation Agency. This afforded me an op- 
portunity to conduct a supplementary study 
herein reported. 

The Franklin Institute tests used 30 dia- 
grams representing the positions of jet air- 
craft whose blips might be seen on a radar 
scope. To each diagram was attached a 
“strip” identifying the type of plane, its loca- 
tion, speed, heading, altitude, and air route. 
It occured to the author that these diagrams 
might be used as prototypes for diagrams 
which might be assembled into an analogies 
test which might well prove useful to the 
training staff of the F.A.A., or of a military 
establishment. Therefore, with the most 
cordial cooperation of the training staff of 
the Oberlin A.R.T.C. Center, and the able 
assistance of James M. Jones, who was ad- 
ministering the Franklin Institute tests, a set 
of 30 problems was assembled. The accom- 
panying figure presents a sample of these 
diagrams. 

Each diagram was mounted separately, 
with its strip, each problem requiring eight 
diagrams. A sample of the test was presented 
in the following manner: Diagrams I and II 
are laid out as in the figure and the S$ is in- 
instructed to report what he sees as the rela- 
tionship, that is, the similarity and dissimi- 
larity, between them. Then, IIT being placed 


138 


below I, he is asked to compare III with I. 
After these statements have been recorded, 
the five diagrams, a, b, c, d, and e, are spread 
out with the instruction, “Select that one of 
the lettered diagrams which most nearly 
represents the relationship to III that II does 
to I.” He is instructed to think of fitting the 
correct diagram into the rectangular pattern, 
as has been done in the accompanying figure. 
(The figure is a diminished reproduction of 
I, II, III, and d, the correct choice.) The 
Ss were volunteers ‘from the floor,” who were 
paid for their time. The controllers inter- 
preted the test as an interesting challenge of 
their professional ability. Having been in- 
formed, however, that the project in which 
we are engaged was being conducted by the 
Franklin Institute, they appeared to be en- 
tirely free of apprehension that their score 
on the test would in any way affect their 
status with the F.A.A..To reduce a possible 
fatigue factor the test was administered in 
two sessions 1 day apart. 

We at first assumed that what this study 
would produce would be a distribution of 
correct answers. However, it became evident 
very shortly that this type of test does not 
have one unequivocal answer. This is due 
to the fact that the choice of an answer de- 
pends upon whether one interprets Diagram 
II as a situation requiring attention, one in 
which there is a potential confrontation. If 
II is interpreted as presenting no problem, 
he should select that one of the five alterna- 
tive diagrams which he interprets as having 
no problem, and vice versa. Consequently our 
attention shifted, during this “pilot” period, 


Test FoR AiR TRAFFIC CONTROLLERS 





Problem / 
Ar 





A/C A B707 
A/C B B707 


0/VWV 
0/CLE 


550k Alt’ 310 heading 097 
470k " 330 ¥ 287 


(760) 
(160) 





A/C A L188 30E/ERI 300k Alt 280 heading 255 (j29) 
A/C B BS2 30S/CLE 420k " 280 005 (J85) 
A/C CT33 6SE/CLE 370k 280 : 287 (J60) 


Fic. 1. The diagrams used in the same problem, namely, L, II, 
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A/C A B720 35W/CLE 520k Alt 270 heading 097 (J60) 
A/C BB720 25W/CLE 520k " 250 + 097 (J60) 
A/C CB707 40E/CLE 420k " 240 s 287 ~=(J60) 
A/C DB707 30E/CLE 420k "™ 260 " 287 (J60) 
A/C E CRVL 45W/CLE 400k " 280 : 097 (J60) 
A/C F CRVL SOE/CLE 400k " 280 - 287 (J60) 





A/C A L188 7ONE/CLE 300k Alt 240 heading 255 (29) ‘ 
A/C B BS2 40E/CLE 420K 240 a 287 (J60) x 
A/C CT33 25S/CLE 370k 280 N 005 (J85-J29) 
A/C D B52 1SE/CLE 420k " 260 o 287 (J60) 


III, and d, the correct choice. They are 


reduced in size from that of the originals, which is 9 X 9 inches. 


from interest in counting “the correct” solu- 
tions, to an analysis of the attitudes ex- 
pressed and the methods employed in solving 
the problems. This required recording the 
S’s verbalizations throughout the task. But 
the fact that the § did spontaneously express 
himself, both in interpreting the nature of 
the situation and in describing the procedure 
he would follow in handling the situations, 
provided the data for our judgment concern- 


ing his traits. From these data we prepared 
sketches of the Ss. 

Recognition of the fact that the controllers 
were of two minds as to whether or not the 
II diagrams presented a problem led to the 
designation of a “no problem index.” This 
score is the sum of the occasions in which 
a S declares that there is no problem in the 
situations represented by Diagrams I, II, and 
III. When 30 problems are used, the highest 
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possible score would be 90. The actual range 
in these scores for our Ss was from 24 to 47. 
We found this “no problem index” to be not 
only a significant diagnostic clue, but a means 
of checking the reliability of the judgments 
we made from the Ss’ statements. 

The raw fruits of this study, then, consist 
of (a) the “correct” score or scores, (0) the 
“no problem” scores, and (c) a sketch of the 
S constructed from his verbalizations while 
solving the problems. The test gives promise 
of serving both as a basis (@) for estimating 
certain intellectual abilities, (5) for inter- 
preting personality traits, and (c) work meth- 
ods which have an important bearing on the 
performance of the air traffic controller. 

Some evidence of a man’s alertness may 
be obtained from the number of incorrect 
choices that are made before he grasps the 
nature of the task, as well as from his com- 
ments. One assistant controller apparently re- 
quired exposure to six problems in order to 
see the analogous relationship between the 
four diagrams. Moreover, in order to remem- 
ber the analyses he was making, he found it 
necessary to make a written record of them. 
His overall time requirement was 25% more 
than the average. In the case of Ga, an 
assistant controller with 8 years experience, 
it was possible to detect three types of 
intellectual handicap. 

1. He made errors of perception in reading 
the “strip,” as when he said that aircraft 
were in potential confrontation; when, in fact, 
they had opposite headings. 

2. His retentive span was short. After 
having reported his interpretations of Dia- 
grams I and II, and begun comparison of 
the lettered diagrams, he usually had to re- 
turn and solve once more the relationship 
between II and I. 

3. His sense of spatial relationships as indi- 
cated on the scope was weak. He often became 
quite confused when searching for similarities 
in patterns, 

One of the first generalizations of which 
we became aware is that there is a clearly 
marked difference between the cautious man, 
who takes no risks, and the one who is 
inclined to “wait and see.”? The former char- 
acteristically makes an immediate disposition 
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of the case where there is the remotest pos- 
sibility of a confrontation. Va, whose “no 
problem” score is but 24, would say: “Be 
alert to” this or that possible contingency, 
as, for example, when a plane is changing 
altitude with another aircraft approaching. 
In the situation where a Caravel is descend- 
ing to land, starting from a flight-level 2,000 
feet above two faster aircraft, which are 15 
and 20 miles ahead, Va. said: “There is no 
problem here, but I would be considering 
vectoring B or C off course.” The other Ss, 
almost to a man, stating that there is no 
problem, drop the matter, but Va. has in 
mind an appropriate plan of action “just in 
case.” At another time he said: “Either sweat 
it out or give Aircraft B altitude 28,000 to 
eliminate any doubt.”’ Another man with the 
low “no problem” score of 24 is PF. His 
characteristic evaluation is represented by 
this statement: “There is no problem at all 
in this situation as all the aircraft are sepa- 
rated either by altitude or radar; but, per- 
sonally, I would give them wider separation.” 

Contrast with the attitudes of these cau- 
tious men Sk’s reaction to the following situa- 
tion: An L188 and a B52 are converging on, 
but to pass over Cle, from distances of 65 
and 75 miles, respectively, at the same alti- 
tude. Sk. says that, radarwise, they will have 
enough time: “The B52 will arrive over Cle 
almost 2 minutes before the L188.” He added, 
however, “It depends upon your workload, 
but as long as you watch them it will be 
all right.” More cautious controllers want 
more than 2 minutes in which to transmit 
instructions to pilots approaching each other 
at jet-craft speeds, even though they are 
“separated legally.” One of his typical re- 
marks concerning situations where other con- 
trollers prescribe vectoring is: “There is a 
possible confliction here but I don’t think I 
would be concerned.” Ba. had a high “no 
problem” score of 37. In a situation which 20 
Ss interpreted as involving a potential con- 
frontation, Ba. remarked calmly: “You will 
probably have to vector him around.” This 
is the situation: One jet is starting to climb 
5,000 feet to the flight-level of another plane 
which is approaching from a distance of 50 
miles, when their combined airspeed is 900k. 


Test For Arr TRAFFIC CONTROLLERS 


Although Ba. is quite aware of the risk in- 
volved, he is in no hurry to alert the pilot 
of the plane which is ascending and instruct 
him to take appropriate action. 

In the cases cited above the “no problem” 
index proved to be a fairly reliable indicator 
by which to separate the cautious men from 
those more inclined to take a risk. It is evi- 
dent, however, that a low score may actually 
be due to other factors than cautiousness. It 
may result from carelessness, inattentiveness, 
faulty perception or fatigue. The man who 
made a score of 47 found no problem in the 
situation in which one plane is approaching 
another from 15 miles in the rear traveling 
75k faster. In this and numerous other cases, 
while apparently normally awake, he was 
oblivious of the potential collisions involved. 

In one case cautiousness appeared to be 
combined with ‘“‘cocksureness.”’ We. is a man 
of very pronounced feelings, to the point of 
being intolerant. He knows just how each 
problem should be handled and he makes 
decisions promptly. Only two of the Ss fin- 
ished the test in less time. He would exclaim 
with considerable feeling, ‘This situation 
should never have been allowed to develop,” 
and, at another time, “This situation is 
marginal; there might not have been enough 
time. This is the type which separates the 
men from the boys.” At the same time that 
he impresses one as being the type of man 
to whom one would gladly commit one’s 
safety, one has a feeling that he is not very 
considerate of those who disagree with him. 

Some controllers gave evidence of a tend- 
ency toward stereotyping in their methods 
of handling the aircraft. Usually they pre- 
ferred vectoring to a change of altitude. Bu. 
is an exception to this rule. He was very 
inclined to avoid vectoring, recommending it 
only 8 times while calling for a change of 
altitude to avoid confrontations as many as 
52 times. Controllers also differ in the degree 
to which they tend to defer to the pilot’s 
wishes, or to permit them freedom of choice. 
In one of the problems three planes are repre- 
sented as approaching a thunder storm from 
different directions. One S said that, inasmuch 
as the pilots could all see the location of this 
local storm, he would let them use their own 
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judgment as to the best way of avoiding it, 

To determine the validity of these person- 
ality sketches we asked three members of the 
training staff to select the names of our Ss 
with whom they were most familiar. R. se- 
lected three, T. chose four, and S. named 
five. In this blind matching test each judge 
made a perfect score; every man was iden- 
tified immediately. After recognizing the 
identity of the men from the sketches read 
to him, R. brought from the files the ratings 
and personal characterizations of these men 
which had been sent from Oklahoma City 
where they had obtained their basic training, 
and he exclaimed, “You have hit the nail 
right on the head.” “We ought to have this 
test for use when a man is brought off the 
floor following ‘an incident.’ ” 

After identifying correctly each of the men 
whom he had selected, T. remarked, “I knew 
whom you were describing after you had 
read the first two sentences.” He then sug- 
gested, “If you will leave us a set of these 
tests, with the personal sketches, we could 
send them to the regional headquarters, where 
they might be used in the screening of 
candidates.” The S’s performance on the 
matching test was still more noteworthy for 
he named each of the five men on his list 
without hesitation. Being the assistant train- 
ing officer he was in position to know more 
of the controllers than did the other judges. 

It is regrettable that exhaustion of funds 
prevented carrying this study further. One 
unsolved question concerns the optimum. 
length of the test. We employed 30 problems 
because that was the number of diagrams 
being used in the Franklin Institute tests. 
After testing 20 men we cut the number of 
problems to 20. At that point we had to 
terminate testing. So only 2 men took the 
shorter test. The results suggest that this 
number is quite sufficient. It may be that 
even 12 or 15 would be adequate. The long 
test required an average of 8 hours. And 
because, in the pilot portion of the testing, 
our attention was directed more to eliminating 
the bugs in the test than in studying the 
characteristics of the Ss, we wrote only 16 
sketches. 

The Oberlin ARTC Analogies Test is, then, 
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a preliminary form of an individual person- 
ality instrument especially tailored for a 
unique occupation, that of air traffic con- 
troller. That it was highly motivated was 
evidenced by the intense interest that was 
manifested. The Ss declared that the situa- 
tions presented in the test were highly real- 
istic. Members of the training staff declared 
that such a test might be of considerable 
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value in dealing with candidates for upgrading 
and with problem cases. It seems possible, 
also, that the analogies format might well 
prove applicable in a variety of industrial 
situations. It is a matter of considerable 
regret that the author is not able to pursue 
this study beyond this preliminary stage. 


(Received December 30, 1964) 
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Measures of 3 types of motivation to work were related to 2 criteria of job 
performance, both of which reflect the degree to which the organization has 
rewarded individual behaviors. In the white-collar sample (N = 1,047), which 
was composed largely of technical personnel, low performers were motivated 
primarily by the social environment of the job and, to a lesser extent, by the 
opportunity of gaining recognition through advancement, but few significant 
relationships were found between intrinsic self-actualizing motivations and 
job performance. In the blue-collar sample (N = 421), no significant relation- 
ships were found between any of the motivational measures and job perform- 
ance. With advancing age and tenure, work became more meaningful for 
high performers but less meaningful for low performers, although the impor- 
tance of the social environment increased for both high and low performers. 


There has been an increasing emphasis 
during the past several years on differences 
between two basic types of need which may 
be satisfied in the work situation, or between 
two contrasting kinds of motivation toward 
work. The first of these is described variously 
as self-actualizing, ego involving, or intrinsic 
and internalized motivations; the second is 
described as extrinsic and externalized moti- 
vation or as striving to fulfill deficiency or 
maintenance needs (Maslow, 1955). These 
two types of motivation can be represented in 
terms of the effects they have upon either (a) 
criteria of individual benefit and/or (0) cri- 
teria of job performance or organizational 
benefit. A number of studies have dealt with 
the motivation-individual benefit relationship, 
including those concerned with job satisfac- 
tion (Friedlander, 1964; Herzberg, Mausner, 
& Snyderman, 1959; Hoffman & Mann, 
1956), as well as those which deal with the 
various effects of motivation upon mental 
health (Gurin, Veroff, & Feld, 1960; Korn- 
hauser, 1965). These studies, in a broad sense, 
indicate that the self-actualizing worker in- 
teracting with the content and process of his 
work tasks has a greater probability of at- 
taining job satisfaction and mental health 
than the deficiency-motivated employee inter- 
acting with the contextual environment of his 
job. 

Concurrent with this dichotomous model of 
self-actualization versus deficiency-need moti- 


vation is the implication that the worker who 
is motivated by self-actualization is a better 
performer on the job than his counterpart who 
is motivated by deficiency needs. Herzberg et 
al. (1959), for example, found that a majority 
of satisfying incidents involved intrinsic job 
characteristics, and, in turn, that a majority 
of satisfying incidents also contained self- 
reports that in some way job performance 
had improved. But no data are presented to 
indicate a direct relationship between inci- 
dents involving intrinsic job characteristics 
and incidents containing self-reports of in- 
creased job performance. Although Fine and 
Dickmann (1962) report comparable results 
using a similar self-report criterion of pro- 
ductivity, they also found that satisfaction 
concerning money items was perceived as 
having an immediate influence on productiv- 
ity. Furthermore, and applicable to both of 
these studies, a self-perception that one has 
improved his job performance may be quite 
unrelated to an actual improvement in per- 
formance. According to the Protestant ethic, 
it is conceivable that self-reports of increased 
job performance may be nothing more than 
moral justifications for increased job enjoy- 
ment. 

Of perhaps even greater relevance are the 
effects that situational variables may have 
upon the motivation-performance relationship. 
It is probable, for example, that organiza- 
tional reward systems affect this relationship 
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by positively or negatively reinforcing certain 
worker response behaviors. What is com- 
monly referred to as “good” job performance 
is simply that behavior which a particular 
organization (or subunit) condones and 
(probably) rewards. Behaviors which it con- 
siders as poor performance, the organization 
will attempt to extinguish or at least nega- 
tively reinforce. Thus if the organization re- 
wards behaviors which are motivated by needs 
for social relations with one’s peers and 
supervisor, a high relation may be found be- 
tween this behavior and a performance cri- 
terion. On the other hand, an employee whose 
satisfactions derive from the expression of his 
own abilities, from the exercise of his own 
decisions, from the challenge of the work it- 
self, and from his own personal growth there- 
from may be viewed as a poor performer in 
that organization. As Katz (1964) points out, 
such an employee is not necessarily tied to a 
given organization, and it may matter little to 
him where he does work, provided that he is 
given ample opportunity to do the kind of job 
he is interested in doing. Indeed, he may con- 
tribute little to organizational goals beyond 
his specific role. 

The effect of organizational rewards on the 
motivation-performance relationship is exem- 
plified in the contrast of three studies. Davis 
(1954) and Vroom (1962) found in research 
environments that employees who were ego- 
involved in their work received higher per- 
formance ratings from supervisors, while 
Peres (1963) found that (high) performance 
of scientists and engineers loaded negatively 
on job factors of intrinsic importance. This 
apparent contradiction stems in part from 
differences in the relevance of the perform- 
ance criterion used. While Davis asked raters 
(supervisors) to think of each scientist as a 
potential candidate for a research grant and 
Vroom asked supervisors to make a number 
of judgments of their subordinate’s perform- 
ance, Perez used as a performance criterion 
the level that the employee had attained (un- 
corrected for age or tenure) within the or- 
ganization. The latter method would seem to 
give a more objective picture of the behavior 
which the organization has rewarded, while 
supervisory judgments may merely indicate 
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behaviors which the supervisor condones (or 
would like to think he condones) but does not 
necessarily reward. 

While these several previous studies offer 
some important preliminary evidence of a 
motivation-performance relationship, the re- 
sults are tempered by (a) use of a potentially 
biased self-report criterion of job performance, 
(5) use of indirect or implied measures of re- 
lationships, or (c) the lack of explicitness 
concerning the effect of organizational reward 
systems for condoned performance upon the 
motivation-performance relationship. These 
limitations become particularly severe if the 
dual model of job motivation is to be actually 
used in industrial freld situations, as for ex- 
ample in a predictive capacity as an employee 
evaluation or selection device. 

The purposes of this study were to explore 
further, in a research and development set- 
ting, the relationships between job perform- 
ance and three basic types of job motivation, 
with data being analyzed separately (a) ac- 
cording to the respondents’ occupational 
group (white collar versus blue collar), (0) 
by age bracket, and (c) by length of organi- 
zational tenure. 


METHOD 
Sample 


This study was conducted in an isolated commu- 
nity of about 12,000 people, of which the 3,200 
civilian wage earners all work directly for an agency 
of the United States government. The primary mis- 
sion of research and development entails the efforts 
of a core group of about 900-engineers and scientists, 
plus a much larger group of personnel who serve in 
a variety of support and service roles. Thus, the 
sample upon which this study was based represents 
a variety of occupations and socioeconomic levels. 
Completed and usable questionnaires were returned 
from 1,468, or about 45% of the working popula- 
tion; control data indicated minor distortions in 
returns in the direction of greater participation from 
scientists and engineers and from those at the 
higher white-collar levels. Of the 1,468 respondents, 
1,047 were -Classified as white collar (graded GS 
personnel) and 421 were. classified as blue collar 
(apprentices, journeymen, and supervisors in vari- 
ous trades). Since the cultural norms as well as the 
criterion which the organization utilizes for successful 
performance for blue-collar workers might be quite 
different from that for white-collar workers, sepa- 
rate analyses were performed on these two groups. 
Although the white-collar sample contained a wide 
range of occupations including clerical and other 
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administrative personnel, the majority were in 
technical areas, primarily engineering and the various 
sciences. Since the sample was thus composed largely 
of technical personnel working in a technological 
environment, these personnel might more aptly be 
referred to as a technical white-collar group. 


Criterion Measures 


Two criteria of performance within the organiza- 
tion were utilized. These were measures of the 
(salary) level to which the employee had ascended 
(a) relative to others his age and (b) relative to 
others who had spent the same length of time in the 
organization. The purpose of controlling for age and 
tenure was to attribute a higher degree of success to 
an employee who had achieved a certain level sooner 
in his life span and within a shorter length of time 
since joining the organization. Previous analyses by 
Hulin (1962) in a sample of executives and by 
Friedlander (1963a) in a sample of research sci- 
entists indicate that a salary-level criterion (con- 
trolled for tenure) correlates significantly with 
socioeconomic background (.23) and education (.51), 
but nonsignificantly with tenure (.00) and number 
of levels advanced (— .04). 

Each of the two samples (white collar and blue 
collar) were initially divided into (a) four age 
categories and separately into (b) five tenure cate- 
gories. Within each of these categories, those who 
had attained a level of GS-13 or above (for white 
collar) or a supervisor level (for blue collar) were 
classified as high on the performance criteria; those 
who had risen no higher than the GS-7 level or no 
higher than the apprentice level were classified as low 
performers. Among white-collar employees with the 
longest tenure (over 12 years with the organization), 
for example, high performers were those who had 
already reached or exceeded the GS-13 level, while 
low performers were those who had still not ex- 
ceeded the GS-7 level. A similar comparison would 
hold for any age category. 


Motivation Measures 


Motivation is here conceived as a specific process 
that energizes differentially certain responses to the 
work situation, thus making them dominant over 
other possible responses to the same _ situation 
(adapted from English & English, 1958). As such, 
the concept incorporates the importance that the 
employee attaches to various facets of his work. 
Accordingly a questionnaire was constructed in which 
the respondent was asked to indicate how important 
each of several work facets was to his feeling of 
satisfaction or dissatisfaction. Choices in the Likert- 
type scale ranged from “of extreme importance to 
me” to “of no importance to me.” It should be 
noted that the questionnaire was not directly con- 
cerned with satisfaction, for this would have re- 
sulted in responses which were conglomerates of the 
worker’s motivations and the unique stimuli of his 
current job. Since it was the more underlying moti- 
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vational structure which had developed over the 
worker’s entire vocational history which was rele- 
vant to this study, the effect of the immediate job 
(whether it was positive, negative, or absent) was of 
little interest and was minimized. 

The motivation variables selected for this study 
were drawn from the findings of the study by 
Herzberg et al. (1959) who found that 16 core items 
evolved from a content analysis of interview data, 
and that these 16 were associated primarily with 
either “good feeling” incidents or “bad feeling” inci- 
dents. A subsequent factor analysis by Friedlander 
(1963b) indicated that the variance of these 16 items 
could be accounted for by three underlying relatively 
independent factors. Since this threefold factor struc- 
ture was utilized in the current study, a brief de- 
scription of the factor content follows: 

1. Social Environment—Encompasses the inter- 
personal, social, and technical aspects of super- 
vision, of the work group, and of the working con- 
ditions. The items include the importance of the 
working relationship with one’s supervisor, working 
with a supervisor who knows his job, working rela- 
tionship with one’s co-workers, a smooth efficient 
work group, a feeling of job security, and manage- 
ment policies which affect the feelings of the em- 
ployees. 

2. Intrinsic Self-Actualizing Work—Includes the 
development and full use of one’s capacities and 
talents, particularly as related to the intrinsic work 
process itself and to the relationship of this process 
to the development and growth of the individual. 
Items include performing challenging assignments, 
the use of one’s best abilities, a feeling of achieve- 
ment in the work, the opportunity for freedom, and 
training and experience that help one’s growth. 

3. Recognition Through Advancement—Concerns 
recognizable signs of advancing in the organization, 
and encompasses the challenging assignments and 
increased responsibility that generally accompany 
tangible evidence of recognition, such as increased 
salary and advancement. 

Three factor scores were calculated for each of 
the 1,468 employees in the current study. These,» 
then, formed the three motivation measures to be 
related to the performance criteria. 


Experimental Design 


Essentially, the procedure was to test the signifi- 
cance of the difference in motivation for high per- 
formers versus low performers within each of the 
age and tenure groups. This procedure was repeated 
separately for each of the three motivation areas, 
for white-collar and blue-collar workers, and for 
each of the four age categories and five tenure cate- 
gories. Tests of the difference in motivation were 
performed concurrently across the four age categories 
and again across the five tenure categories to pro- 
vide six 1X4 and six 1X5 analyses of variance 
(ANOVAs). Each set of six ANOVAs involved three 
separate motivation-factor scores for the white- 
collar category and three motivation-factor scores 
for the blue-collar category. In addition to the 
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TABLE 1 
ANALYSES OF VARIANCE OF SCORES IN THREE MOTIVATION AREAS 














1. Social environ- 2. Intrinsic 3. Recognition- 
ment work advancement 
Source of oo _.. ae 
variation df MS F MS F MS F 
Among white-collar employees using attained level per age group as the criterion 
Performance (P) 1 10.92 33.09** .06 _ 2.26 6.36* 
Age (A) 3 2.89 8.76** A255 1.00 AS ee 
POA 3 .88 BO lis .60 2.38 2.01 51053" 
Within cell 398 nO 25 36 
Among white-collar employees using attained level per tenure group as the criterion 
Performance (P) 1 12.35 36.27** .00 — 2.21 6.05* 
Tenure (T) 4 90 2.64* ro — .20 —— 
Rexel 4 18 — .08 -- 14 — 
Within cell 396 34 .26 0 
Among blue-collar employees using attained level per age group as the criterion 
Performance (P) 1 .00 — .02 — .07 -- 
Age (A) 3 14 — 30 1.04 eh" — 
PXA 3 seit/ 1.29 24 — aD — 
Within cell 132 29 30 44 
Among blue-collar employees using attained level per tenure group as the criterion 
Performance (P) 1 25; —- 01 — 01 — 
Tenure (T) 1a .20 — sl — nis 1.70 
Pex 1 05 == 54 — nu —_— 
Within cell 116 Soil ney A4 
*p <.05. 
wp S01. 


Note.—F ratios have been computed before MSs were rounded to the nearest .01. 
* The frequency distribution of blue-collar workers among the 5 tenure groups was such as so to necessitate collapsing the 5 
categories into 2: up to 12 years and over 12 years. 


comparison in motivation between high and low 
performers within each age and tenure category, the 
three motivation areas were compared for the 
“extreme high performers” (high level in youngest 
age group or shortest tenure group) versus “extreme 


low performers” (lowest level in oldest group or long- 
est tenure group). Where ANOVAs resulted in sig- 
nificant F ratios, biserial correlations were computed 
between the performance criterion and each of the 
motivation scores. 


TABLE 2 


BISERIAL CORRELATION COEFFICIENTS BETWEEN PERFORMANCE AND SCORES IN THREE 
MorivaTtonat AREAS ror WuitE-CoLLaR Workers IN Four AGE CATEGORIES 


0—V0R0#0e0e0#0?0?0—eo0>0?0?00NMNMNMsMNS SSS SS SSS See 


Age category 








Motivational area 20-29 30-39 40-49 50-++ Alls Extreme> 


——_———— oa ee eee 


Social environment — .38** —.53** — .47** —.08 —.40** —.66** 
Intrinsic work aS —.16 —.14 .27* 03 ue 
Recognition- 

advancement .07 — .38** — .33** —.05 —.18** .09 


* » <.05, based on tests of simple effects in ANOVA. 
** b <.01, based on tests of simple effects in ANOVA. 

Note.—Performance based on salary level attained within age groups. 

® Based on all employees regardless of age. 

b High level within the youngest age group versus low level within the oldest age group. 
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TABLE 3 


BISERIAL CORRELATION COEFFICIENTS BETWEEN PERFORMANCE AND SCORES IN THREE MorTIvATIONAL 
AREAS FOR WHITE-COLLAR WORKERS IN Five TENURE CATEGORIES 


Years of tenure 





Motivational area 








up to 3 4-6 7-9 10-12 13+ Alle Extreme> 
Social environment — .34%** —.61** —.50** —.30 — .37** —.45 — .73** 
Intrinsic work — .09 14 —.02 —.06 02 00 —.03 
Recognition- 
advancement — 30* —.16 —.18 —.05 —.20 —.19* — .38* 





* » <.05, based on tests of simple effects in ANOVA. 
4 » <.01, based on tests of simple effects in ANOVA, 


Note.—Performance based on salary level attained within tenure group. 


® Based on all employees regardless of age. 


b High level within the shortest tenure group versus low level within longest tenure group. 


RESULTS 


Results of the study will be discussed in 
terms of each of the three motivational areas. 
Within each area, separate breakdowns will 
be made for the level/age and level/tenure 
criteria for both white-collar and blue-collar 
employees. 

Social environment. Within the white-collar 
groups, low-performance employees place sig- 
nificantly greater importance than do high- 
performance employees upon the social and 
interpersonal characteristics of their work en- 
vironment, using either the level/age criterion 
or the level/tenure criterion (see Table 1). 
There is clearly a negative relationship be- 
tween performance and social-environmental 
motivation as represented by biserial correla- 
tions of — .40 for the level/age criterion (see 
Table 2) and — .45 for the level/tenure cri- 
terion (see Table 3). Tests of simple effects 
indicate that these relationships are significant 
and sizeable (7, = — .38, — .53, — .47) in all 
but one of the age categories (50 years and 
over) and significant and sizeable (r= — 
94, — 61, — .50, — .37) in all but one of the 
tenure categories (10 to 12 years). In addi- 
tion, a comparison of the extreme high per- 
formers versus extreme low performers indi- 
cates that those who have been unable to 
achieve promotion from the low levels even 
after 12 years with the organization or 50 
years of age place far greater emphasis upon 
the social environment than those who have 
achieved a high level within less than 3 years 
(% = —.73) and are less than 30 years of 
age (7) = —.66). 


Although differences across age and tenure 
groups in motivation were not central to the 
concepts tested in this study, it is interesting 
to note the trend in the importance of the 
social environment for high performers only. 
Observation of these trends for high per- 
formers in both Figures 1 and 2 shows a 
definite and consistent trend: the social en- 
vironment becomes an increasingly greater 
motivator for high performers as they become 
older or as their tenure in the organization 
increases.’ This trend is less consistent for the 
low performing group. 

Analyses of variance for the blue-collar 
sample in this study failed to show any sig- 
nificant main effects or interactions. Regard- 
less of the criteria utilized, there are no sig- 
nificant differences between high performing 
and low performing blue-collar workers in the 
strength of the social environment as a moti- 
vator. 

Intrinsic self-actualizing work. Few dif- 
ferences were found between high performing 
and low performing white-collar workers in 
the emphasis they placed upon the intrinsic 
characteristics of their work. Using level at- 
tained within tenure groups as a criterion, no 


1 Tables of means and differences from which these 
figures were drawn and upon which the ANOVAs 
were performed have been deposited with the Ameri- 
can Documentation Institute. Order Document No. 
8635 from ADI Auxiliary Publications Project, 
Photoduplication Service, Library of Congress, Wash- 
ington, D. C. 20540. Remit in advance $1.25 for 
microfilm or $1.25 for photocopies and make checks 
payable to: Chief, Photoduplication Service, Library 
of Congress. 
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and low performing white-collar workers toward 
three areas of work. Performance is based on salary 
level per age group. 


main or interaction effects were significant, 
nor was the comparison between extreme high 
performers and extreme low performers. When 
the criterion of level attained within age 
groups was used, again no main effects were 
significant. However, since the Performance X 
Age-Level interaction reached significance (p 
< .05), simple effects of performance were 
tested. These indicated that the only age 
category in which the importance of intrinsic 
work differed was in the 50+ age group: 
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high performers in this oldest category placed 
significantly greater emphasis upon the in- 
trinsic characteristics of their work (7) = + 
.27). In addition, extreme high performers 
did attach significantly greater importance to 
intrinsic work characteristics than did extreme 
low performers (7 = + .31). 

No main effects or interactions were signifi- 
cant for the blue-collar workers, nor were 
the extreme performance comparisons, regard- 
less of the criterion used. However, an inter- 
esting trend appeared in the low performing 
group, using level/age as the criterion. Low 
performing blue-collar workers place decreas- 
ing importance upon the intrinsic work char- 
acteristics as they become older. No such 
trend was noticeable in the comparable high 
performing group. 
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Fic. 3. Motivation toward work among high and 
low performing white-collar workers as a function 
of age. 


Recognition through advancement. In the 
white-collar group, low performers attach sig- 
nificantly greater importance to the oppor- 
tunities for receiving recognition, responsi- 
bility, and promotion than do high perform- 
ers. This difference holds using either the 
level/age criterion (F = 6.36, p< .02), or 
the level/tenure criterion (F = 6.05, p< 
.02). Tests of simple effects indicate that low 
performers. emphasize the recognition area in 
the two middle (30 to 49 years) age brackets 
(7» = — .38 and — .33), in the shortest (un- 
der 3 years) tenure group (7 = — .30), and 
to some extent in the longest (over 12 years) 
tenure group (7) = — .20, p < .07). Ina simi- 
lar manner, obtaining recognition through ad- 
vancement is of significantly greater import to 
extreme low performers than to extreme high 
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performers, using the level/tenure criterion 
(% = — .38): 

In the blue-collar group, ANOVAs indicate 
no significant difference between low and high 
performers in the strength of the recognition 
area as a motivator. 

Additional contrasts among groups of white- 
collar workers in terms of strength of the 
‘three motivational areas are pictured by age 
categories in Figure 1 and by tenure cate- 
gories in Figure 2. It can be observed that in- 
trinsic work is of greater importance than 
either social environment or recognition to 
both high and low performers at the early 
age and tenure periods. In the 4- to 6-year 
tenure period and in the 30- to 39-year age 
group, a divergence appears: high performers 
maintain an importance ranking of intrinsic 
work > recognition > social environment 
which endures throughout their twenties and 
thirties in terms of age and throughout their 
first 12 years with the organization. Low per- 
formers, however, rank importance in terms of 
the social environment > intrinsic work > 
recognition throughout the remainder of their 
working lives, and through the first 12 years 
with the organization. 

For high performers, intrinsic work ranks 
first in importance across all age and tenure 
groups; of secondary importance are recogni- 
tion (at the younger ages, 20 to 39) and the 
social environment (at later ages, 40 and 
over). For low performers who have once 
passed the age of 30, the ranking is consist- 
ently social environment > intrinsic work > 
recognition. Thus, with increasing age and 
tenure, there is a tendency for the social en- 
vironment to become more important than 
recognition-through-advancement for high per- 
formers and more important than intrinsic 
work for the low performers. 

Further examination of Figure 1 indicates 
noticeable similarities in trend among the 
three motivational areas for low performing 
personnel and also shows that this trend is 
quite different from that of high performers. 
The motivation of low performers in all three 
areas begins at a moderate level, reaches a 
peak during the middle years, and lessens 
appreciably after age 50. All three low per- 
former trends are () shaped. For high per- 
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formers, motivation toward the intrinsic work 
and recognition areas shows a slight U-shaped 
trend with advancing age, reaching a low ebb 
during the middle two age periods, and in- 
creasing again in the 50-and-over group. The 
social environment, on the other hand, be- 
comes consistently more important with age 
for high performers. Thus, in the two more 
advanced age brackets (40 and over) a con- 
stant linear trend becomes noticeable (for all 
six plotted lines in Figure 1): for high per- 
formers all three motivational areas increase 
in strength, while for low performers all de- 
crease in strength. These divergent trends are 
apparent in Figure 3, which represents the 
total motivation (the sum of the three moti- 
vational areas) for high performers and low 
performers. Were it not for the extremely 
low importance attached to the social environ- 
ment by high performers in their early years, 
the high performer trend line would be more 
clearly U shaped. 


SUMMARY AND DISCUSSION 


Within the white-collar group, employees 
for whom the social environment of their work 
or the opportunity for gaining recognition 
through advancement are prime motivators 
were generally found to be poorer perform- 
ers than employees for whom these two areas 
are less important. On the other hand, there 
is some indication that those motivated by the 
intrinsic self-actualizing aspects of their work 
are superior in performance. It is apparent 
that the need for achievement through task- 
involvement is related to high performance, 
while the need for achievement through recog- 
nition and advancement is related to poor 
performance. When all of the above analyses 
were applied to blue-collar workers, no sig- 
nificant differences in the motivational pattern 
of high performers and low performers were 
found. 

Comparisons among the three potential 
motivators for high performers only indicate a 
clear hierarchy: intrinsic work is of greatest 
importance, recognition is second, and the 
social environment is valued least. This moti- 
vational hierarchy contrasts with that of low 
performers, for whom the social environment 
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is most important, intrinsic work second, and 
recognition least important. 

Age and tenure seem to have significant 
effects upon these motivation-performance re- 
lationships. There is a marked tendency for 
high performers to stress the social environ- 
ment as age and, to some extent, tenure ad- 
vance. Both low performers and high per- 
formers place primary emphasis on intrinsic 
work in early years, but for low performers 
this emphasis soon shifts to the social environ- 
ment. For high performers, while the impor- 
tance of the social environment approaches 
that of intrinsic work as age and tenure in- 
crease, the intrinsic meaning of work does 
maintain its primacy. 

The importance of work as a whole (the 
three motivational areas combined) generally 
increased in importance for low performers 
until age 30, leveled off through the thirties 
and forties, and then declined sharply from 
50 years on. This ()-shaped trend of change 
in motivation with age was reversed for high 
performers, for whom there was a slight de- 
crease in total motivation until age 30; but 
from this point on, work assumed greater 
meaning and importance. It can be inferred 
from these divergent trends that despite the 
frustrations of job involvement encountered 
by the high performer during his middle 
working years, his motivation to work is even- 
tually heightened through both intrinsic satis- 
factions and rewards in the form of promo- 
tion. In contrast, the low performer encounters 
a work environment which gradually erodes 
the initial importance he had placed upon in- 
trinsic task satisfactions and promotional re- 
wards. Although the socioemotional rewards 
of his work context remain relatively impor- 
tant, eventually work becomes far less mean- 
ingful for him. 

Katz (1964) draws a sharp distinction be- 
tween individually administered and system 
rewards, and stresses that system rewards 
are more effective for holding members within 
the organization than for maximizing other 
organizational behavior. As the present study 
has shown, however, these system rewards do 
not lead to higher performance than the 
minimum required to stay in the organization, 
Since system rewards are given across-the- 
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board to all members, or differentially in terms 
of seniority, they will probably not serve 
more than to maintain the employee in the 
organization (Katz, 1964). 

Within this particular organization, be- 
haviors which are motivated by needs for 
social interaction or needs for recognition- 
through-advancement are negatively rewarded 
by the organization. It would appear that the 
inducement to low performing employees for 
remaining with the organization is something 
other than promotion and the rewards that 
are administered in relation to individual 
effectiveness. Their incentives would then in- 
volve predominantly system rewards—those 
available simply from organizational mem- 
bership, which become increasingly attrac- 
tive on the basis of seniority in the system. 
These would include comradeship with one’s 
fellow workers, the socioemotional satisfac- 
tions of group membership, job security, fringe 
benefits, pleasant working conditions, etc. 

Paradoxically, those employees motivated 
primarily to obtain recognizable signs of ad- 
vancement in the organization are negatively 
rewarded in the very area in which they dis- 
play their strongest needs: they attain lower 
levels relative to their own age and tenure 
groups than their high-performing fellows who 
consider recognition-through-advancement less 
important but attain it despite a lower moti- 
vation toward it. 

On the other hand, one of the rewards for 
behavior motivated by primary concern with 
the intrinsic self-actualizing characteristics of 
work is, at least in this organization, promo- 
tion to increasingly higher levels. But the 
very nature of the motivations of this group 
would indicate that a promotional type of 
reward is less important to the high perform- 
er’s satisfaction than his sense of challenge, 
freedom, achievement, and growth. Thus, 
though the high performer is rewarded with 
the benefits of higher performance, such as 
greater salary and recognition, his primary 
reward arises from satisfactions gained 
through involvement in the intrinsic work 
process, 

Such a reward system may not hold or be 
appropriate for all organizations, Katz (1964) 
hints at the complexity of the intrinsic job 
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satisfaction-job performance problem as fol- 
lows: 


The motivational pathway to high productivity 

. can be reached through the development of 
intrinsic job satisfaction. The man who finds the 
type of work he delights in doing is the man who 
will not worry about the fact that the role requires 
a given amount of production of a certain quality. 
His gratifications accrue from accomplishments, 
from the expression of his own abilities, and from the 
exercise of his own decisions. ... He may, [how- 
ever], contribute little to organizational goals be- 
yond his specific role. 


It is only within an organization that implic- 
itly or explicitly encourages and rewards 
these behaviors, such as the one in the cur- 
rent study, that a positive relationship will 
be found between self-actualizing motivations 
and job performance. The relationship prob- 
ably varies not only from organization to 
organization, but also among subgroups within 
any one organization. 

The lack of any significant motivation- 
performance relationship among the blue- 
collar sample in this study raises serious ques- 
tions concerning the motivation to work 
among members of this cultural group. It 
should be noted that the jobs held by the 
blue-collar workers in this study were not 
paced, routinized assembly-line operations. 
The blue-collar workers in this study were 
carpenters, plumbers, electricians, warehouse- 
men, ordnancemen, (experimental) machin- 
ists, (electronics) mechanics, etc. Such posi- 
tions would allow for at least a moderate 
amount of initiative, responsibility, and con- 
trol over one’s productivity, thus permitting 
performance to be influenced by individual 
motivation. 

Among blue-collar workers, there is a tend- 
ency for work (all three motivations com- 
bined) to decline in importance as age and 
tenure advance. However, whereas this decline 
is relatively mild for high performers, it is 
quite dramatic among low performers, par- 
ticularly in their motivation for recognition 
and advancement and self-actualizing work. 
Perhaps even in trade jobs (as distinguished 
from assembly-line work), the reward of (sal- 
ary) level-attained for blue-collar workers is 
influenced sufficiently by seniority so that 
differences in motivation between high per- 
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formers and low performers are obscured. If 
the blue-collar worker perceives the reward 
system in this way, as suggested by Davis 
(1946) and Centers (1948), his motivations 
toward intrinsic work and recognition as 
sources of satisfaction might readily decline 
with time. With one minor exception, the 
hierarchy of importance for blue-collar work- 
ers is (a) the social environment of work, 
(6) intrinsic work, (c) recognition through 
advancement—regardless of high performance 
or low performance, regardless of age, and re- 
gardless of tenure with the organization, For 
the blue-collar worker, the rewards of task 
involvement and recognition never did hold 
primary importance as sources of satisfaction. 
For this group, the prework culture evidently 
forms a motivational system which remains 
intact throughout, and is probably reinforced 
by, the work environment. It is possible that 
the cultural norms of blue-collar workers are 
sufficiently different from those of white- 
collar workers that each group behaves in 
accordance with its individual principles, and 
generalizations concerning the motivation- 
performance relationship cannot be made 
from one cultural group to the other, 
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OPTIMUM CUTTING SCORES TO DISCRIMINATE 


GROUPS OF UNEQUAL SIZE AND VARIANCE * 


LEONARD G. RORER, PAUL J. HOFFMAN, GAIL E. LaFORGE,? 
anp KUO-CHENG HSIEH 


Oregon Research Institute 


The accuracy with which a test classifies people, objects, or events as belonging 
to 1 of 2 groups depends upon: the distance between the means, the relative 
variability, and the relative size of the 2 groups. An analytical method is 
presented for determining the optimal cutting score when estimates of these 
parameters are available and when it can be assumed that the test scores are 
normally distributed for each of the 2 groups. In order to assess a test’s in- 
cremental contribution to accuracy, the proportion of erroneous decisions to 
be expected on the basis of optimum cutting scores must be compared with the 
proportion of erroneous decisions to be expected on the basis of the base rates 
alone. It is shown that many situations exist in which “valid” tests cannot 
improve upon base-rate predictions. Tables are provided for a rapid determina- 
tion of the optimal cutting score for a given condition; these tables also indicate 
the conditions under which base-rate predictions should be made and the 
proportion of erroneous decisions to be expected when the optimum strategy 


is used. 


Decision making, either rational or irra- 
tional, is a necessary part of every psycholo- 
gist’s daily activity. Any rational decision- 
making process may conceptually be divided 
into two stages: (a) the accumulation and 
synthesis of as much relevant information as 
possible, and (0) the selection of the decision 
strategy to be used once this synthesis is 
completed. This paper focuses on the second 
stage of the decision process. 

However many variables may be relevant 
to a decision, there exists at any given time 
some optimal way of combining them so as 
to produce a composite score on the basis 
of which the decision may be made. In many 
cases this composite may be represented by a 
continuous variable. If the decision to be 
made is dichotomous, the underlying con- 
tinuous variable may be related to the 
decision in a fairly simple way. A typical 
example is the use of a cutting score on a 
test: all cases above the cutting score are 
assigned to one category, and all cases below 
the cutting score to the other. 

Published tests are frequently accompanied 
by a manual which contains suggested cutting 
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scores for various decisions. In almost all 
cases the suggested scores are those which 
optimally separated two equal-sized. groups 
in a validation study. But, while validation 
studies generally employ equal-sized groups, 
such groups are rarely encountered in prac- 
tice, and the best score to differentiate two 
groups of equal size usually will not be the 
best one to differentiate two groups of un- 
equal size (e.g., Meehl & Rosen, 1955). As 
the relative size of the two groups changes, 
a change also takes place in the probability 
that an individual receiving a particular score 
belongs to one group rather than the other, 
In general when a fallible test of any kind 
is used to assign a probability value to a 
statement concerning a person, object, or 
event, the probability value to be associated 
with any given test outcome will vary with 
the antecedent probability that the statement 
is true of the person, object, or event in 
question for the population under considera- 
tion. Therefore, if the maximum proportion 
of correct classifications is to be made, the 
cutting score that is used to assign individuals 
to one group or the other must change as the 
relative size of the groups changes. That such 
adjustments are not routinely made is hard 
to understand. A partial explanation may lie 
in the fact that previous writers (e.g., Blum- 
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Fic. 1, Situation in which size and variance are both 
greater in one group than in another. 


SCORE 


berg, 1957; Cureton, 1957; Dawes, 1962; 
Meehl & Rosen, 1955; Rimm, 1963) have 
provided only an empirical solution to the 
problem for each situation; that is, they have 
suggested that each investigator must collect 
samples of cases from his files, plot the score 
distributions for each of the groups, and 
thereby determine the appropriate cutting 
score for his situation—a tedious and time- 
consuming undertaking. It is the purpose of 
this paper to provide an analytical method 
and the necessary tables so that published 
cutting scores may be appropriately adjusted 
for use in any situation. 


THE MOopDEL 


In the interest of ease of exposition, the 
model will be presented in terms of a specific 
illustrative situation involving a choice be- 
tween two alternative treatments: the decision 
to hospitalize or not to hospitalize a psychi- 
atric patient. For convenience, patients who 
would profit from hospitalization will be 
called sick; those who would not so profit 
will be called well. The parameters of the 
distributions of test scores for the two groups 
will be designated by the subscripts s and w, 
respectively. It is assumed that the following 
are known: the mean and standard deviation 
of the test scores for each group, and the 
relative size of the two groups. It is also 
assumed that the test scores are normally 
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Fic, 2. Situation in which valid discriminations 
are possible even though the groups differ only in 
variability. 
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distributed for each group, and that both 
Type I and Type II errors are equally 
costly, that is, that it is as bad to hospitalize 
a well person as to fail to hospitalize a sick 
person, 

If the distributions specified by the indi- 
cated parameters are drawn on the same co- 
ordinate system so that their areas are pro- 
portional to the sizes of the groups, then the 
points of intersection yield optimal cutting 
scores (Cureton, 1957). An example is shown 
in Figure 1, which illustrates a situation in 
which both the frequency and the variability 
of the W group are greater than the cor- 
responding parameters of the S$ group. 
Clearly, for any given score, the optimum 
decision is that which assigns the individual 
to the group whose curve is highest at that 
point; thus, for the situation depicted in 
Figure 1, all individuals whose scores were 


REE OLUIEINIG: ¥ 


eae al 


Eso mlm ouCEO nur 


Fic. 3. Situation in which differences in relative 
group sizes render a “valid” test unusable. 


less than C; or greater than Cz would be 
classified as W, and all individuals whose 
scores fell between C,; and C»s would be 
classified as S. 

So little attention has been given to the 
possibility of unequal variances or unequal- 
sized groups that the occurrence of two cut- 
ting scores may seem strange. However, the 
possibilities are of more than academic inter- 
est and can be shown to have applicability to 
situations previously considered “paradoxi- 
cal.” For example, clinical lore has it that 
on the MMPI Schizophrenia Scale the poorest 
prognosis is associated. with T scores in the 
range from 70 to 90. If the standard deviation 
of the Sc scores of schizophrenic patients 
with poor prognosis is less than the standard 
deviation of Sc scores for the general popula- 
tion, and if there are fewer schizophrenics 
than normals, then the distributions of the 
two groups might be related as in Figure 1, 
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with C; = 70 and C= 90. The S group in 
this case would be “schizophrenics with poor 
prognosis,” and the W group would be “nor- 
mals.” If such is the case, then the practicing 
clinician is correct in assigning a different di- 
agnosis to respondents with scores either less 
than 70 or greater than 90. It may be noted 
that in the case of a continuous prediction 
function this situation results in a curvilinear 
relationship between the variables for the 
two groups. 

Traditionally, when a test has been used 
to predict a dichotomous criterion, it has 
been considered valid if the mean scores of 
the groups to be discriminated were signifi- 
cantly different from each other. However, 
if the groups differ in either size or variabil- 
ity, such a definition of validity no longer 
suffices. When considered from the traditional 
viewpoint, apparent contradictions occur (@) 
when different variances make possible suc- 
cessful decisions on the basis of an “invalid” 
test (see Figure 2), and (b) when extreme 
differences in the sizes of the two groups make 
discrimination impossible even with the use 
of a “valid” test (see Figure 3). 

The fact that differences in both means 
and variances should be considered when dis- 
criminating two groups of equal size has pre- 
viously been pointed out by Penrose (1947). 
Figure 2 shows that Penrose’s formulation 
may be extended, and that discrimination is 
possible even when the means of the two 
groups are identical. As an illustration of 
Figure 2, consider a common example from 
the industrial area. If it were felt that a job 
would be too hard for people low on an 
intelligence scale and too boring for those 
high on it, applicants would be accepted only 
if their scores fell in the middle ranges of 
the scale. In this illustration the S group is 
composed of “successful workers” and the W 
group is composed of “unsuccessful workers.” 
Though the mean intelligence score is the 
same for both groups, the standard deviation 
of the successful group is smaller than that 
of the unsuccessful group, and valid discrimi- 
nation is therefore possible. 
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At the other extreme, it is quite possible 
that clinicians attempting to identify poten- 
tial suicides from among members of the 
general population face a situation such as 
that depicted in Figure 3 (see Rosen, 1954). 
In this case the S group is composed of indi- 
viduals who would attempt to commit suicide, 
and the W group is composed of those who 
would not. These illustrations make it clear 
that a general solution for the problem of 
determining cutting scores must allow for 
cases in which zero, one, or two valid cutting 
scores exist, and must simultaneously take 
into account differences in the means, vari- 
ances, and sizes of the two groups. No pre- 
vious publication has approached the problem 
from this comprehensive perspective. 


A Formal Solution 


Since the optimal cutting score lies at the 
point of intersection of the two distributions, 
it may be obtained by setting the two func- 
tions equal to each other and solving for the 
desired values. The Gaussian distribution 
functions may be written in the form 
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where w and s designate the well and the sick 
groups, respectively, and the other symbols 
have their usual definitions. At the point of 
intersection the ordinates are equal, that is, 
Y.= Y,, and +» = ”, = C, the cutting score. 
In that event 
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If the predictor is scaled so as to yield a 
standard score distribution of the form uy 
= 0,o» = 1 for the well group, then [1] can be 
simplified : 
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Ce [2] 
Equation [2] is a quadratic expression, the 
roots of which establish the optimal cutting 
scores illustrated by Figures 1, 2, and 3. 

In order to illustrate the effects of changes 
in the values of the various parameters, nomo- 
graphs of equation [2] have been prepared 
and are included here as Figures 4 and 5. 
Figure 4 illustrates, for a fixed standard devi- 
ation ratio (o;/ow = .6), the effect of simul- 
taneous changes in the separation of the means 
and the relative sizes of the groups. Figure 5, 
on the other hand, illustrates, for a fixed dis- 
tance between the means, the effect of simul- 
taneous changes in the ratio of the standard 
deviations and the relative size of the groups. 

Some examples may help to clarify these 
nomographs. Consider a test on which the 
mean score for the sick group is one standard 





CUTTING SCORE 
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Fic. 4. Nomograph showing appropriate cutting 
scores for all possible ratios of group size for selected 
mean differences when the ratio of the group stand- 
ard deviations is 0.6 (i.e., ws/uw = .6). 
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CUTTING SCORE 








Fic. 5. Nomograph showing appropriate cutting 
score for all possible ratios of group size for selected 
g:/ow ratios when the mean difference between 
groups is one standard deviation (4s = ww + ow). 


score greater than that of the well group. 
Further, assume that the standard deviation 
of the scores of the sick group is only .6 as 
large as the standard deviation of the well 
group. From Figure 4 it can be seen that the 
“usual” validation study, that is, one carried 
out with equal-sized groups, would result in 
cutting scores at +.36 and +2.77 (standard 
score units). Individuals with scores in this 
range would be called sick; all others would 
be called well. If 80% of the population were 
assumed sick, then the cutting scores would 
be approximately —.17 and +3.30. 

As the separation of the means for the two 
groups increases, the magnitude of the change 
in the optimum cutting score for groups of a 
given relative size decreases; but for extreme 
ratios of group size, score changes are sub- 
stantial. Note that if no more than 20% of 
the cases tested were expected to be sick, it 
would not be advisable to use this hypotheti- 
cal test at all. To do so would result in more 
errors than if all individuals were called well. 
This latter situation was illustrated in Figure 
3. Of further interest, note the function 
vs = 0 in Figure 4. With the given inequality 
of variance, and when 38% or more of the 
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cases are expected to be sick, it is possible to 
make valid use of this test even if the means 
of the sick and well groups are identical, that 
is, even though the biserial validity coefficient 
is zero. This is the situation previously illus- 
trated in Figure 2. 

The illustrative cutting scores given for 
the hypothetical test in Figure 4 may also be 
located in Figure 5. Here it can be seen that 
unequal variances may have a profound ef- 
fect on the use which should be made of a 
test. If the standard deviation of the scores 
of the sick group is equal to the standard 
deviation of the scores of the well group (i.e., 
the “traditional” case), then there is but one 
cutting score. Furthermore, it can be seen 
that only in the special case where both 
groups have the same variance is a test useful 
no matter how discrepant the size of the 
groups. If the variances are equal, and pro- 
viding the range of reliable scores can be 
indefinitely large, then valid use could be 
made of the test even though the sick group 
might be indefinitely small as compared to 
the well group. This situation may exist only 
in theory, however. 


Base RATES AND ERROR REDUCTION 


The “base rate” or “incidence” of an 
attribute in a population is defined as the 
percentage of cases in that population which 
have the attribute in question. For example, 
the base rate of “sickness” is given by 


Se Nsi. In Bayesian terms, the base rate 
Ne+ Nw 


of an attribute is the a priori probability that 
a randomly selected member of the popula- 
tion will have that attribute. When one group 
is very large or very small relative to the 
other, the base rate is said to be “extreme.” 
The literature has provided numerous ex- 
amples showing that failure to consider base 
rates can result in decisions less accurate 
than those which would have been made with- 
out the use of a test (Dawes, 1962; Meehl 
& Rosen, 1955). These cases usually involve 
base rates differing greatly from a 50-50 
split. For these extreme base-rate popula- 
tions, large changes in cutting scores are re- 
quired for optimal discrimination, if, indeed, 
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discrimination is possible at all. In any given 
situation, the importance of the base rate as 
a factor in the decision function may be 
evaluated in two ways: (@) the extent to 
which the cutting score would shift if the 
base rate were taken into account, and (0) 
the change in the proportion of erroneous 
decisions which would result from the pro- 
spective change in cutting score. Figures 4 and 
5 have shown how cutting scores change as 
a function of base rates. Figure 6 illustrates 
the effect of a change in cutting score on 
the number of errors. In Figure 6 the per- 
centage of decisions which will, in the long 
run, be in error is related to the cutting 
score for an illustrative case in which the 
standard deviations are equal and ps = pw 
+ oy, = 1. For a given base rate, the opti- 
mum cutting score is that at the lowest point 
of the curve. For a population in which 30% 
are sick, this would be about 1.350. Roughly 
25% of the decisions based on this procedure 
would be in error. If, however, 90% of the 
population were sick, then the use of this 
same score would result in 58% of the de- 
cisions being in error. In this latter case, use 
of a cutting score of —2o, would result in 
less than 10% of the decisions being in error. 

Small departures from the optimum cutting 
scores result in negligible increases in the 
proportion of erroneous decisions, but in some 
regions the proportion of errors increases rap- 
idly in relation to the magnitude of the score 
change. Note that, for populations in which 
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e CUTTING SCORE FOR 
EQUAL SIZED GROUPS 
Fic. 6. Illustrative graph of error as a function of 
cutting score when ps=pwtow=1 and os=ow. 
Lines are for populations in which 30% and 90% 
of the population are presumed to be sick. 


SICK 
MAX! MUM 


PER CENT OF CASES 


MINIMUM 








Fic. 7. Relationships that must hold among basic 
test parameters if use of the test is to result in greater 
accuracy than could be achieved on the basis of 
base-rate information alone. 


30% and 90% of all individuals are sick, the 
diagnostician could be right 70% and 90% 
of the time, respectively, without the use of 
any test at all. The improvement on the 
accuracy that could be achieved through 
optimum use of this “valid” test is slight 
for the 30% sick population (4.5%) and 
negligible for the 90% sick population 
(0.13%). 

As has been pointed out earlier, if the 
variances of the two groups are unequal, then 
even negligible improvement over base-rate 
predictions may be impossible. Figure 7 shows 
the relationships that must exist among the 
basic parameters before improvement on base- 
rate predictions can be achieved with opti- 
mum test usage. The curves in the lower part 
of the figure indicate, for tests in which 
the sick group is less variable than the well 
group, the minimum percentage of cases that 
must be sick in order for the test to con- 
tribute to accurate prediction. For example, 
the hypothetical test with a mean difference 
between groups of one standard score and 
a os/ow ratio of .6 would be worthless when 
fewer than 21% of the cases in the population 
were sick. This value may be confirmed by 
consulting either Figure 4 or Figure 5. The 
curves in the upper part of the figure indi- 
cate, for tests in which the sick cases are 
more variable than the well cases, the maxi- 
mum percentage of sick cases for which the 
test could contribute to accurate prediction. 
Thus, a test with a mean difference between 
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groups of one standard score and a o;/ow 
ratio of 1.4 could be used only in populations 
in which fewer than 68% were sick. A state- 
ment about this kind of limitation on the use- 
fulness of a test would appear to be a valu- 
able part of any test manual. 


TABLES 


While the exposition so far has been in 
terms of a sick group and a well group, it 
should be clear that the results hold for the 
differentiation of any two groups, which may, 
for convenience, be labeled simply S and W. 

Solutions to Equation 2 have been calculated 
for all parameter values likely to arise in prac- 
tice (Rorer, Hoffman, & Hsieh, 1964),? and 
examples from these data are reported here in 
Tables 1, 2, and 3. Each of these tables re- 
lates to a single difference (us; — wy») between 
the means of the two groups. All entries as- 
sume a transformation of the scores such that 
Py = O and oy = 1, and all values of ps — py 
are positive. However, the distributions are 
symmetric about ps =py, so that the tables 
are equally useful whether the mean difference 
is positive or negative. When the difference, 
bs — Pao, iS negative, the sign of the value in 
the table must be reversed. If ws — py» is nega- 
tive and os = ow, then the decision rule must 
also be reversed. 

In each table, the horizontal axis contains 
a distribution of ratios of the standard devia- 
tion of the S group to the standard deviation 
of the W group, and the vertical axis repre- 
sents various proportions of the population 
(NV, +N.) which are thought to belong to 
the S group. Each cell in a table contains five 
values: C, and Cy—the cutting scores; o— 
the proportion of Ws that will erroneously be 
called Ss; @—the proportion of Ss that will 
erroneously be called Ws; and y—the propor- 
tion of the total population, both Ws and Ss, 
that will be misclassified. 

From these values, the proportion of false 
positives (FP), cases which are called S$ but 
are really W, and the proportion of false 


negatives (FN), cases which are called W but 


3 The complete set of tables, which are accurate 
to four decimal places, may be obtained from Oregon 
Research Institute, P. O. Box 5173, Eugene, Oregon, 
97403 for $1.00. 
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TABLE 1 
Hs — Hw = - 0.508 
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(e) Cutting score greater than 


(c) No cutting score exists because Pw is 
(A) When os/ow = 1 there is only one cutting score. 


be disregarded for all practical purposes. 


) 995 < FE < 1.00. 
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< .005 


because Ps is too small. Classify all cases as W. 


bw < 0, change the signs of the cutting scores. If us — uw < 0 and os = ow, change the sign of the cutting score 
(d) Cutting score less than —9.995 


No cutting score exists 


bc 


(b 
too small. Classify all cases as S. 


Note—Decision Rules: if cs/ow < 1, classify all cases between Ci and C2as S; if os/ow > 1, classify all cases between Ci and Czas W; if os/ow = 1, 
If 


classify all cases greater than C as S; 
Definition of symbols: (a) 


and reverse the decision rule. 
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Note.—See the note under Table 1. 
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Note—See the note under Table 2. 
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are really S, may be found by means of the 
following formulae: 


aPy 
Dee 
EE aP wy = Ve sa pis i 
and 
isha ee Bi 
US rae Wind See i 
where 
9 Nw N; 
fie Net Nie Paes 


An investigator wishing to adopt the “best” 
rule for assigning individuals to one of two 
groups on the basis of scores of a predictor 
would carry out the following steps: First, 
arbitrarily designate one of the groups as the 
S group, and the other as the W group. Sec- 
ond, conduct a study or use some other basis 
to estimate ps, fw, Gs, aNd ow. Third, compute 

Ms —~ Kw 





R= zs and find the table (page) ap- 
propriate to this value. Fourth, estimate 
the ratio, os/ow, and the base rate 
and find the column and row, respectively, 
that correspond to the obtained values. Fifth, 
from the cell common to the column and row 
selected, obtain the cutting score values. If 
os < ow, individuals between these scores are 
called S; if og > ow, individuals between these 
scores are called W. If o, = ow, there is but 
one cutting score, and individuals with scores 
greater than C are called S. Sixth, decide 
which type of error, a, 8, or y, is most ger- 
mane to the situation, and note this value in 
the cell. Appropriate probability values can 
then be associated with the classificatory 
statements concerning individuals in the 
population. Seventh, compare the errors to 
be expected when the test is used with those 
that would result on the basis of actuarial 
prediction alone. A value judgment must then 
be made concerning the use of the test: Is the 
reduction in the percentage of expected errors 
great enough to justify the use of the instru- 
ment (procedure) ? 

The following example, which may also be 
located on the nomographs, illustrates the 
procedure. First, the group with the larger 
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mean is arbitrarily designated the S group. 
Second, ps — pw is found to equal 1.00, indi- 
cating that Table 2 should be consulted; 
o3/ow is found to be .6, which indicates that 
the third column from the left is appropriate. 
The population is thought to be 95% W and 
5% S, which indicates that the fourth block 
down from the top is appropriate. C, and C2 
refer to tabular footnote 6, which indicates 
that the test is of no use in this situation. All 
cases should be called W. This means that all 
of the Ws will be correctly classified (a = 
0.00), while all of the Ss will be misclassified 
(@ = 1.00). Overall, 5% of the population 
will be misclassified (y = .05). Five percent 
of those called W are really S (FN = .05), 
and since no cases are called S, there are no 
false positives. 

Now consider the same test with the as- 
sumption that P, = .10 and P, = .90. The 
ninth block of numbers from the top in the 
third row from the left indicates that Cy = 
— 42 and Cz = 3.54. Sixty-six percent of the 
Ws and 1% of the Ss will be misclassified. 
Overall, 7% of the classifications will be 
erroneous. By formulae [3] and [4], 


.66 X .10 & 
~ (66 X .10) + .90 — (.01 X .90) — 


"4 01 X .90 a 
= (01 X .90) + 10 X (66 % 10)en 


That is, of those cases called S, 7% will really 
be W, and of those cases called W, 21% will 
really be S. (The apparent contradiction aris- 
ing from the fact that y and FP are both .07, 
while FN > .07, is due to rounding. The y is 
actually slightly greater than .07, while FP is 
slightly less.) By contrast, if the test were not 
used at all, all cases would be classified S, 
with the result that all of the Ws would be 
misclassified (a = 1.00), none of the Ss would 
be misclassified (8 = 0.00), FP would equal 
.10, there would be no false negatives, and 
90% of all decisions would be correct (y = 
.10). The test increases 8 and FN while de- 
creasing a, y, and FP. 


FP 





07, 





FN wake 


DiIscussION 


Of the many issues that might be discussed 
in relation to the proposed procedures, five 
have been selected for brief mention. 


OptimuM ScorES FOR GrRoUP DISCRIMINATION 


Unknown parameters. It has been shown 
that optimum cutting scores are a function of 
the means and variances of the groups to be 
discriminated. Yet, for many of the tests in 
general use today, estimates of these pa- 
rameters are reported in neither the test 
manual nor the literature. Little can be done 
about this situation other than to deplore it, 
and to point out that it is hard to provide a 
rationale for the use of tests for which this 
information is lacking. 

It has also been shown that cutting scores 
are a function of the relative size of the 
groups to be discriminated. Yet, even though 
many situations are such that information 
about the base rates could be the most accu- 
rate information going into the decision func- 
tion, base-rate information is often unavail- 
able. In most clinical installations it would be 
a relatively simple matter to tabulate case 
records. Even in cases where base rates can- 
not be known exactly, some boundaries can 
be specified. In many cases, the specification 
of such limits would be sufficient to set a cut- 
ting score that would be much more accurate 
than the one derived for optimum discrimina- 
tion of equal-sized groups. The appropriate 
procedure would be to use the extreme pa- 
rameter estimates to find a range within 
which the appropriate cutting score must lie. 
This range may be small, in which case a com- 
promise score could be used. If the range in 
which one of the cutting scores might lie is 
large, a more appropriate procedure would be 
to use a three-category decision procedure, 
calling those cases which fall within the cut- 
ting-score range “indeterminate.” Such a 
situation is illustrated in Figure 8. In this 
example, it has been estimated that .30 < P; 
< .50, and the cutting scores for these values 
are indicated in the figure. Scores between .36 
and .86 and between 2.27 and 2.77 are classi- 
fied indeterminate. As an alternative, 


ts 30 + .50 Ag 

Z 

could be adopted as a “best” estimate, and 

the cutting scores appropriate to this value 
iam—eoo,. Co — 2.56) used. 

Losses. There are admittedly many cases in 

which the assumption that a and # errors are 
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Fic. 8. Cutting score procedure when unknown 
parameters prevent the specification of exact cuts. 


equally costly is untenable. Lest the reader 
feel that practical application of the optimal 
cutting score procedures developed here is 
invalidated by this fact, it should be pointed 
out that such is not the case. For it should be 
noted that the clinician or investigator who 
uses the usual test manual cutting score with- 
out regard for base rates or variances is also 
tacitly accepting the assumption that the two 
kinds of error are equally costly. 

Furthermore, the point is academic. Indi- 
viduals who feel that different outcomes have 
different utilities ought to be able to approxi- 
mate such subjective utilities by assigning 
numerical values to them. In a subsequent 
paper (Rorer, Hoffman, & Hsieh, 1966) it is 
shown that the present cutting score tables 
may be applied, once these values are ar- 
ranged into the appropriate four-fold utility 
table. 

Normality. The assumption that scores are 
normally distributed in both groups may not 
be met in some situations. Obviously, a nor- 
malizing transformation can be found for 
either one distribution or the other. How- 
ever,.the correction applied to the one dis- 
tribution may fail to improve, or may even 
worsen the approximation to normality of the 
other. It is necessary to consider both dis- 
tributions simultaneously, with the aim of 
adopting a correction that would produce 
close approximations to normality in both 
groups of scores. While no way of making 
exact specifications for approximation to the 
normal curve has been attempted here, it 
would seem that if the distributions are even 
approximately normal, then the scores indi- 
cated by the procedures outlined would pro- 
duce fewer erroneous decisions than those giv- 
ing no consideration to the base rates. In ex- 
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treme cases, cutting scores based on graphical 
representations of test-score distributions may 
provide the only satisfactory approach. 

Information transmission versus decisions. 
Psychologists, particularly those in clinical 
settings, sometimes argue that tests are used, 
not to make decisions, but rather to provide 
information which may be combined with 
other information in order to “get a com- 
plete picture of the patient.’’ When presented 
in its most extreme forms, this line of argu- 
ment would lead one to believe that the en- 
tire goal of clinical activity is the accumula- 
tion of information, and that no decisions are 
ever made concerning the treatment of pa- 
tients. Such is obviously not the case. But 
even if it were, it would make no difference. 
In order validly to infer from a test or a se- 
ries of tests and measures that an individual 
is “cheerful,” for example, it is necessary to 
know something about the distribution of 
“cheerful,” and “noncheerful” individuals on 
the test function being used, and to know the 
base rate of “cheerfulness” in the population 
of individuals to which the instrument is be- 
ing applied. There is no logical difference be- 
tween assigning an individual to a group 
which will be described as “cheerful” (i.e., 
describing him) and assigning him to a group 
which will be hospitalized (i.e., making a de- 
cision about him). 

Time and money. Finally, one hears dedi- 
cated workers state they can afford neither 
the time nor the money for setting up their 
own decision procedures, that the press of 
everyday activities is so great that it is neces- 
sary to make do with whatever tests and tech- 
niques are readily available. Underlying such 
a statement is the widespread but erroneous 
assumption that there are tests and cutting 


Rorer, Horrman, LAForcre, AND HsIEH 


scores that have a kind of universal validity. 
But, current opinion to the contrary notwith- 
standing, there are many situations in which 
“valid” tests cannot improve upon base-rate 
predictions, and, conversely, there are some 
situations in which completely “invalid” tests 
can yield accurate decision classifications. 
Whatever the limitations inherent in the 
tables and procedures that have been pre- 
sented, the limitations involved in not using 
them are far greater. For any given decision 
situation, the few minutes required to con- 
sult the appropriate tables might well elimi- 
nate much wasteful testing and avoid numer- 
ous, needlessly incorrect, decisions. 
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BEHAVIOR IN A NONEXPERIMENT: 
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This study was designed to determine whether a field research process itself 
has an influence on employee productivity which might become confounded 
with the influence of more legitimate independent variables. 73 male workers 
in 1 department of a factory were studied. Field research operations, con- 
sisting of observational and survey techniques, were employed during the 
middle 2 weeks of the 6-week “before,” “during,” and “after” experimental 
design. Research effects on the total department and the 8 work groups were 
negligible. Hypothesized moderator influences for age, authoritarianism, rural- 
urban background and union activity level, although small, were found thus 
demonstrating that research operations can affect different people in different 


ways on a productivity criterion. 


The problem of this study was to determine 
whether a social science research process 
itself, in a field study of an industrial 
organization, has an influence on employee 
productivity which might conceivably become 
confounded with the influence of other, more 
legitimate independent variables. The study 
includes consideration of research influences 
which might interact with personality or so- 
ciological variables so as to produce differen- 
tial effects depending upon the characteristics 
of the people involved. This is a critical ques- 
tion because of the growing tendency in in- 
dustrial research to relate the effects of inde- 
pendent variables to such moderating charac- 
teristics. If the effects of research interact 
with such moderator factors in a nonrandom 


manner, then whatever data our research 


1 Parts of this manuscript were presented as a 
paper to the 1964 meeting of the International Con- 
gress of Applied Psychology in Ljubljana, Yugo- 
slavia. The authors are strongly indebted to Walter 
Nord, now a graduate student at Washington Uni- 
versity, and Allan Schwartzbaum, a graduate stu- 
dent at the New York State School of Industrial 
and Labor Relations for their able assistance in 
planning and conducting the study. The Research 
and Publications Division of the New York State 
School of Industrial and Labor Relations provided 
financial support. Professor Isadore Blumen provided 
valuable statistical consultation. Finally, we are in- 
debted to the Plant Manager and union officials of 
the organization studied for their cooperation. 

2 Now at The University of Michigan. 


studies generate somehow must be adjusted 
to take this effect into account. 

The problem area under consideration is 
not new, although its dynamics are not fully 
understood, It often is referred to in social 
research by the term, “Hawthorne Effect,” 
based on the classical industrial research con- 
ducted at the Western Electric Company in 
the 1930s (Roethlisberger & Dickson, 1946). 

Orne (1959, 1962) has published perhaps 
the most thorough analyses of the problem 
as it relates to experimental social psychology. 
He makes a strong case for the existence of 
“demand characteristics” in the social psy- 
chological experiment. According to Orne, the 
perceived cues in an experimental setting 
“demand” that the S act (consciously or 
unconsciously) in accordance with the over- 
all impression he forms regarding the experi- 
ment from these cues. 

There can be no doubt that many cues 
exist in all research settings that interact with 
Ss’ needs and personalities so as to produce 
hunches among Ss regarding the E’s purpose 
and hypothesis. Moreover, the evidence sug- 
gests that the researcher often may “get what 
he is looking for’ as a result of perceived 
cues. Examples from a wide variety of be- 
havioral research studies can be found in 
Viteles (1953), Dunette and Heneman 
(1956), Selltiz (1959), Rosenthal (1963), 
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Cook (1962), Cantor (1951), Hovland and 
others (in Selltiz, 1959), and Remmers 
(1954). In some cases, however, the re- 
searcher may find that Ss sometimes act 
contrary to the perceived research purposes. 
Examples are described by Argyris (1952), 
Scott (1962) and Vidich and Bensman 
(1960), all on the basis of industrial experi- 
ences. Block and Block (1955) provide addi- 
tional support from an experiment on college 
student Ss. Their experiment, which closely 
parallels the field research to be described 
herein, also demonstrates the potential in- 
fluence of moderator variables on research 
effects. 

No evidence is reported in the literature 
on the ways environmental cues and _indi- 
vidual characteristics may interact to produce 
contaminated research results with respect to 
a productivity criterion in formal organiza- 
tions. The present study was designed to 
clarify our knowledge in this regard, with 
special emphasis on moderating variables 
which may lead to individual differences in 
Ss’ reactions to research stimuli. 


METHODS 
Experimental Setting and Subjects 


This study was conducted in one of several furni- 
ture manufacturing plants owned and operated by 
a large manufacturing concern. The plant employs 
approximately 450 males and females. The experi- 
mental Ss are 73 male workers engaged in upholster- 
ing operations in one department of this unionized 
plant. The average length of service for the workers 
is 10 years; only 5 or 6 men have been with the 
company for less than 2 years. The workers can be 
considered at least semi-skilled as a function of 
the fairly lengthy time period required to achieve 
proficiency in this line of work. 

The workers upholster living room furniture on 
a straight line assembly line basis, each line being 
comprised of approximately 10 men including a 
first-line supervisor. The work is manually paced, 
that is, there are no mechanized conveyors. However, 
each worker’s pace is influenced to some extent by 
the pace of his “neighbors” on either side of him. 
An individual financial incentive system, MTM,? is 
applied to these workers despite the interdependent 
nature of the tasks performed. 


3 MTM is a standardized work measurement pro- 
cedure based upon time study procedures. It serves 
as the basis for establishing incentive rates on each 
operation that the workers perform. See Strauss and 
Sayles (1960) for fuller details. 
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Experimental Design 


The study was conducted over a 6-week period 
during the spring of 1963. The 6 weeks were divided 
into 3 periods of 2 weeks each. The first 2-week 
period is called the “before” period, the second 
2-week period is the “during” period, and the third 
2-week period is the “after” period. The research, 
the independent variable of this study, was con- 
ducted in the middle period, labeled during. The 
before and after periods were used to provide data 
we could compare with those obtained during the 
actual research period when research operations were 
conducted. 

The experimental design takes into account three 
distinctly different sampling units: (a) the overall 
73 man department treated as one unit, (b) the 
8 separate production groups (lines), each treated as 
one unit, and (c) the 73 individuals, each treated 
as one unit. 

Thus, analyses were designed to test for research 
effects at all three levels; total department, work 
groups, and individuals. The relative importance of 
the three units, of course, depends upon the purposes 
of a particular investigator. 


Variables 


Independent variable. The independent variable in 
this study is merely the presence or absence of 
behavioral research operations in this plant. The 
research operations included two preliminary meet- 
ings with top management and local union officers 
during which rather general descriptions of the pro- 
posed research were provided. No mention was made 
of our real intent except to the plant manager. All 
other parties (including the workers) were told that 
we wished to compare the attitudes and working 
conditions of furniture workers with those in other 
industries. The meetings were followed by letters, 
on University letterhead, to the workers’ homes. 
These letters described the proposed research, in- 
cluding mention that a thesis would grow out of 
the study. A few days later, two graduate students 
entered the department and spent 2 full work weeks 
(10 working days) observing and informally inter- 
viewing the upholsterers and their supervisors. Two 
“planned” interviews were conducted with each 
worker plus spontaneous interviews that were initi- 
ated by workers. The emphasis in most interviews 
during the first week was placed on technological 


4The experimenters attempted to use the uphol- 
stering department in one of the company’s other 
plants, 800 miles away, for certain control purposes. 
However, during the 6-week period in question we 
found that the other “comparable” department was 
operating on a reduced work week which had a 
decided effect on productivity and team composition 
in that plant. We therefore restricted ourselves to 
a repeated measurements design, in one plant. No 
other departments in this plant could have been used 
for control purposes because of technological and 
group differences (e.g., machines, sex mix and skill 
levels) . 
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considerations. Notes were recorded openly. Both 
researchers appeared on all lines and also ate lunch 
and took coffee breaks with the workers. The nature 
of the questions asked gradually shifted to the areas 
of leadership and group preferences, the topics of 
additional research conducted a year later. A ques- 
tionnaire, which included a number of attitudinal 
and personal history items, was administered on 
the ninth day during a 12 hour period of company 
time. The workers knew that their responses were 
not anonymous although assurances were given them 
that no one in the company or union would ever 
see their individual questionnaires. 

Dependent variable. The primary dependent vari- 
able of this study was productivity as measured by 
percentage of base rate achieved by each worker 
during each of the three 2-week periods in question. 
Data were collected without the workers’ knowledge 
on a weekly basis by the researchers from the com- 
pany records and were treated statistically in several 
different ways. The range of percentage of base rate 
varied, on a weekly basis, anywhere from 60% to 
200% among the 73 production workers involved. 
Reliability studies of this criterion, conducted on 
the same workers during time periods prior to this 
experiment, demonstrated that this measure is highly 
reliable; Pearson correlation coefficients, for per- 
centage of base rate achieved, between 1- and 2-week 
contiguous time periods, consistently fell in the range 
of .88 to .95 on an individual worker basis.5 

Moderator variables. Four moderator variables 
were employed in this experiment.6 These were 
selected to facilitate tests on specific hypotheses to 
be described later. 

1. Authoritarianism—measured by 13 items se- 
lected, on the basis of high factor weights, from the 
California F Scale. The 13 items were buried deep 
in the questionnaire to reduce the possibility of 
distortion. 

2. Rural-urban background—This nominal vari- 
able, which may also measure acceptance of author- 
ity, was measured by means of personal history 
items in the questionnaire. 

3. Age—also obtained from self-reports. 

4. The union activity level of the workers—This 
variable was measured by summing each worker’s 
response to two items regarding the extent of his 
participation in both the business and social affairs 
of the local union. Five point ad hoc scales were 
employed with each question, using answer continua 
ranging from “always” to “never” for each item. 
This analysis is based only on the five most active 
unionists and the five least active. The “active” 
group said they always attend union business meet- 


5 Data also were collected on quality as a possible 
criterion measure. The amount of variance, however, 
was extremely limited, thereby precluding useful 
analyses. 

6 Sociometric status also was tried as a moderator 
variable on the hypothesis that isolates would show 
a traditional Hawthorne effect. However, the socio- 


metric data were not analyzed because of certain 


problems workers had with the instructions. 
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ings and social affairs while the “inactive” group 
said they never attend either type of function. Only 
two items in the questionnaire were related, on their 
face, to this issue. Extreme people were used be- 
cause of the crudity of the measure. All the workers 
in the sample are union members by virtue of a 
union-shop contract clause. The active group, as 
defined above, includes all the major local officers. 


HYPOTHESES 


Several hypotheses were tested in this study 
and are explained below: 

1. The research operations will have no ef- 
fect on the overall average productivity for 
the 73 workers analyzed as a total group. 
This hypothesis is based on the premise that 
the effects of research will be demonstrated 
differentially among individuals and that, in 
a heterogeneous group, they will cancel out. 
The hypothesis was tested by applying re- 
peated measurements analysis of variance 
methods to differences among the three time 
periods, before, during, and after, using the 
total group of 73 men as the sampling unit. 

2. The research operations will have no dif- 
ferential effect on the eight production line 
groups in this study. The logic is identical 
to that suggested above regarding the entire 
department. This hypothesis was tested by 
computing correlations among the eight lines’ 
productivity means for the three time periods 
in question and for two halves of the pre- 
experimental reliability control period men- 
tioned earlier. We reasoned that if the hy- 
pothesis were true, these four correlation 
coefficients should be essentially the same; 
the reliability period coefficient was .91. 

3. There will be individual differences 
among the workers in reaction to the inde- 
pendent variable. This was studied by ex- 
amining the correlations among the workers’ 
productivity levels for the three time periods 
in the study and the reliability control period. 
(Again four correlation coefficients were com- 
puted.) The statistical reasoning was similar 
to that described under 3., above, except that 
now individual rather than group measures 
become the focal point. However, for this 
analysis appropriate differences were expected 
among the correlations, rather than equality. 

4. Those workers characterized by high 
authoritarianism scores on the California F 
Scale will show increased productivity in the 
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TABLE 1 


Propuctivity Mrans (% or BAsE RATE) AND STAND- 
ARD DEVIATIONS FOR THE BEFORE, DURING AND AFTER 
Time Periops, FOR 73 WORKERS 


Time period 








Before During After 
(2 weeks) (2 weeks) _(2 weeks) 
Xp s Xp s XA S 
134.6 28.3 133.8 28.9 138.3 30.8 





face of the experimental variable while those 
characterized by low scores will show a de- 
crease. This effect was hypothesized because 
of observations and data reported by others 
(e.g., Argyris, Block & Block, cited earlier) 
indicating that researchers are perceived as 
authority figures, at least by some Ss. We 
expected the low F Scale scorers to behave 
as many workers do when a time-study man 
observes them, Authoritarians were expected 
to respond differently. 

5. Workers raised on farms will show an 
increase in productivity relative to the other 
workers. This hypothesis was based on ob- 
servations of factory “rate-busters” made by 
Dalton (reported in Whyte, 1955), and more 
recent observations by Edith Lentz suggesting 
that farm-reared nurses make better adjust- 
ments to the authoritarian organizational 
practices of hospitals than their city-bred 
peers (See Strauss & Sayles, 1960). 

6. Older workers will show an increase in 
productivity relative to the younger workers. 
This hypothesis was based on insights gained 
from earlier interviewing done in this firm. 
At that time, older employees seemed to feel 
that they were “poor stepchildren” in this 
organization which has no systematic policy 
to transfer older workers into less physically 
demanding work. 

7. Workers reporting high activity levels in 
the union either will decrease their produc- 
tivity relative to those reporting low levels of 
union activity, or will remain the same while 
the others increase. The rationale for this hy- 
pothesis should be clear from its content. 

In general we expected a certain pattern of 
productivity to occur for the three time pe- 
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riods (Before, During, and After) in the high 
F scale, farm-reared, older, and inactive union- 
ist groups; an increase followed by a decrease. 
We expected a decrease followed by an in- 
crease in the low F scale, nonfarm reared, 
younger and active unionist groups. The ef- 
fects were tested for statistical significance by 
repeated measurements ANOV procedures for 
a p X q experimental design (Winer, 1962). 
The interaction effects were tested by the F 
statistic computed as follows: 


im MSas 
MSsg Ss within groups 
df = qg—1and p(m — 1)(q— 1). 


The interaction test is on the statistical hy- 
pothesis that the profiles, over time, for the 
moderator groups (e.g., older versus younger 
workers) are not different. Simple main ef- 
fects also were tested, regardless of the inter- 
action test results, because they were hy- 
pothesized in advance of the experiment (see 
Winer, page 208). 
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FINDINGS 


Hypothesis I., department productivity 
(N = 73). The hypothesis that there would 
be no significant research effect on the total 
upholstering department is not fully sup- 
ported by Table 1. An overall F test on the 
3 means is significant beyond the .01 level 
(F = 8.55). Application of the Neuman- 
Keouls procedure (described in Winer, 1962) 


TABLE 2 


PEARSON PrRopuUCT-MOMENT CORRELATIONS AMONG 
E1cHt UPHOLSTERING LINES AND AMONG 73 WORKERS 
oN Propuctiviry (% oF BAsE RATE) FOR THE 
THREE EXPERIMENTAL PERIODS AND A 
RELIABILITY PERIOD 








8 Lines 73 Workers 





Time period Ts 7 
Before-During .98** O55 
Before-After .O7** O22 
During-After O5** 9335 
Reliability® O1** 94** 





rae on pairs of group means for the eight lines in each 
period. 
b Based on 73 individual pairs of observations, 
¢ Based on an analysis of contiguous time periods prior io the 
‘before’ period in the experiment. 
p <.01. 
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Fic. 1. Productivity interaction profiles for authoritarianism, rural-urban background, age and union activ- 
ity level in relation to before (B), during (D) and after (A) time periods. 


testing the difference between ordered pairs 
of means reveals that the before and during 
periods differ from one another just below 
the .05 level. The during-after difference is 
significant beyond the .05 level. Although 
some of the differences in the table are sta- 
tistically significant, they are quite small in 
a practical sense. 

Hypothesis II., group (line) productivity. 
The left-hand portion of Table 2 shows the 
Pearson product-moment correlations among 
the eight production lines for the various time 
periods in question and for the two halves of 
a pre-experimental reliability control period. 
All of the correlations are in the .90s and are 
statistically significant beyond the .01 level. 


Thus, the hypothesis of differential research 
effects_on the eight intact employee groups is 
not supported. The groups did not change 
their productivity levels relative to each other 
at any phase of the experiment. 

Hypothesis LI1., individual productivity. As 
in the case of the group data reported above, 
the between time period correlations for indi- 
viduals, in the right-hand portion of Table 2 
are almost uniformly high ranging from .93 
to .95. This seems to contradict our hypothe- 
sis that different individuals would react dif- 
ferently to the experimental treatment. Find- 
ings 4 through 7, however, suggest that this 
correlational analysis may hide these effects 
rather than reveal them. 
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Hypotheses IV-VII; Authoritarianism, ru- 
ral-urban background, age and union activity 
level moderators. Figure 1 summarizes the 
several moderator variables.” The shapes of 
the profiles for the low F Scale scorers, urban, 
younger and active union worker groups are 
all very similar and show the expected decline 
during the research period followed by an 
increase afterwards. The high F scale scorers, 
rural, older, and inactive union worker groups 
show the expected increase during the experi- 
mental period. However, these increases were 
followed by further increases, rather than de- 
creases afterward. 

Three of the moderator variable profiles 
(rural-urban background, age and union ac- 
tivity level, respectively) differ significantly 
(beyond .05 or .01 level) according to the 
results of interaction effect F tests (F = 3.57, 
5.51 and 10.29, respectively). The authori- 
tarianism interaction effect is significant only 
between the .25 and .10 levels (F = 1.55). 
Simple main-effect results, however, show that 
the three production means (before, during 
and after) for the high F scale scorers differ 
beyond the .01 level (F = 6.56) as do the 
three means for the low F scale workers (F 
= 5.34), thus supporting the moderator hy- 
pothesis on this variable. 

The other simple main effects tests provide 
further suggestive data. It seems that the 
major effects of the experimental treatment 
were felt by the rural (F = 5.78, p< .01), 
younger (F = 12.19, p< .01) and active 
union workers (F = 2.90, p < .10 but > .05). 
Smaller effects occurred for the urban (F = 
1.88, p < .25) older workers (F = 2.44, p< 
10 but > .05). The simple main effect for 
inactive unionists did not even approach sig- 
nificance (F = 1.37). 

Given the equivalence of the various com- 
parison groups (see Footnote 7), the data in 
Figure 1 generally support the moderator vari- 
able hypotheses. The significance levels in 

7 The low and high F scale groups contain equal 
proportions of rural and urban-reared workers. They 
also are equivalent on age. The rural and urban- 
reared groups, in the second portion of the table, 
are equivalent on F Scale means and age. The age 
groups are equivalent on proportions of rural and 
urban background workers and on F Scale means. 


The active and inactive unionist groups do not differ 
significantly on any of the three moderator variables. 
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some cases are marginal by customary stand- 
ards, however, and the practical magnitudes 
appear to be inconsequential. 


SUMMARY 


The findings, in general, support most of 
the hypotheses of this study although the 
magnitudes of some of the effects were not 
large from a practical viewpoint. For example, 
the research effect on average productivity for 
the total group of 73 workers, while statisti- 
cally significant, was negligible. The effect on 
the 8 production lines’ relative positions also 
was negligible. Contrary to expectations, 
however, individual workers’ relative produc- 
tivity positions also were barely affected, as 
indicated by the before, during, and after pe- 
riod intercorrelations in the .90s. None-the- 
less, statistically significant moderator influ- 
ences were demonstrated for age, authoritari- 
anism, rural-urban background and union ac- 
tivity level of the workers. These effects can- 
not be considered practically serious in this 
research setting, however, in view of the large 
correlations among time periods already 
mentioned. 

The moderator variable influences also sug- 
gest generalized patterns of either authority- 
dependent reactions or anti-authoritarianism 
on the part of the Ss. The quantitative data 
are supported strongly, in this regard, by the 
unsolicited comments made by many of the 
workers to the research men observing them. 
A commonly heard remark, for example, was, 
“This must be some kind of company-spon- 
sored deal because if the company can afford 
to pay us (the workers) while we fill out 
questionnaires, they must be getting some- 
thing out of it.” Nord (1963) describes the 
qualitative data quite extensively. In general, 
both the quantitative and qualitative data 
support observations by Argyris, Block and 
Block, Scott, and Vidich and Bensman, all 
cited earlier. 

A nagging question remains. That is, were 
there conditions in this study, which might 
not exist in other research settings, and 
which may have attenuated the findings 
herein? Some that are likely to have attenu- 
ated the results herein are: (a) the use of 
graduate student researchers rather than pro- 
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fessors (although their youth may have led 
to the age effect described earlier); (0) the 
technological interdependence of the tasks 
performed by the Ss; (c) the financial incen- 
tive system; (d) the long tenure of the work- 
ers and high stability of their work groups’ 
membership; (e) the high pay and status of 
these workers relative to others in the plant; 
(f) the size of the community which caused 
range restriction on the rural-urban variable— 
very few came from large city backgrounds; 
and (g) the sex of the workers (all males). 
The authors believe that this problem area 
deserves further attention in different organi- 
zational settings. The results of this study 
strongly affected subsequent research strategy 
employed in this factory. 
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CONCEPTIONS OF THE ARBITRATOR’S ROLE’ 
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A cluster analysis of questionnaire data obtained from 101 labor arbitrators, 
management representatives, and union officials suggested 5 dimensions of the 
arbitrator’s role: adherence to precedent, prophylactic orientation, liberality 
of interpretation, elicitation of facts, and procedural formality. Objective role 
conflict was evidenced by significant differences between company and union 
role conceptions on 4 of the 5 dimensions, Arbitrators themselves were prone 
to take a conservative view of their function, and adopted positions between 
the 2 parties on 3 of the 5 role dimensions. 


Among the most significant developments 
that have occurred in the area of union- 
management relations within the past two 
decades has been the rapidly increasing ac- 
ceptance by the parties of third-party inter- 
vention in the resolution of disputes. Although 
the negotiation of the provisions of new labor 
contracts is still based largely on relative 
power positions, companies and unions in the 
vast majority of bargaining relationships 
have come to rely upon arbitration as the 
means of resolving disputes arising under the 
terms of existing contracts. Whether or not 
the incorporation of a system of arbitration 
into the collective bargaining arena heralds a 
long-term diminution of the power dimension 
of labor relations remains to be seen. Regard- 
less of what one might wish to speculate, 
however, the present importance of arbitra- 
tion as a principal method of accommodating 
to industrial conflict certainly deems it worthy 
of investigation by psychologists as well as 
other behavioral scientists. 

Although the appearance of the arbitration 
process on the industrial scene is now quite 
commonplace, the role of the arbitrator in that 
process has been a persistent topic of contro- 
versy in the labor relations field. In part, this 
lack of agreement stems from the fact that 
the arbitrator has been functioning in an area 
which lacks well-defined rules of behavior or, 
as aptly described by Alexander (1958), ‘at 

1This study is based on a dissertation submitted 
to the Graduate Division of Wayne State Univer- 
sity in partial fulfillment of the requirements for the 
PhD degree. The author would like to express his 
appreciation to Hjalmar Rosen, under whose direc- 


tion the research was conducted. 
“Now with the Department of Labor. 


172 


the frontier of industrial society.” More im- 
portant, however, are the differences that exist 
between labor and management as to the 
proper scope of the arbitration process and 
the types of arbitrator behavior which are 
most compatible with each of these view- 
points. While empirical data supporting the 
existence of these differences are largely non- 
existent, references to their existence pervade 
the literature of arbitration (e.g., Cooper, 
1955; Ferguson, 1954; Fuller, 1962). 

Moreover, it may not be only the parties 
who are engaged in controversy as to how 
the arbitrator’s role should be construed. It 
has been suggested (Davey, 1961) that con- 
siderable disagreement exists within the ranks 
of arbitrators themselves regarding the nature 
of their function. Although some arbitrators 
(Fuller, 1962; Taylor, 1957) have argued 
that they should take part in structuring their 
own role, to the extent internal agreement is 
lacking there would appear to be a severe 
limitation on the ability of the profession to 
evolve any generally accepted set of 
standards. 

The research reported in the present article 
had as its objective the gathering of data that 
might replace some of the individual opinion 
and speculation surrounding the arbitrator’s 
function and that might also serve as a foun- 
dation for -future research on the arbitral role. 
More specifically, the study sought to answer 
two basic questions: (a) What are some of 
the major dimensions of the arbitrator role? 
(6) In terms of these dimensions, what dif- 
ferences exist between the role conceptions of 
management representatives, union officials, 
and arbitrators themselves? 
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Merrnop 
Subjects 


A total of 101 arbitrators, union representatives, 
and company industrial relations executives partici. 
pated in the study. 

The initial arbitrator sample consisted of the 34 
active arbitrators in the State of Michigan, as 
judged by membership in the National Academy of 
Arbitrators and/or in the opinion of the Detroit 
office of the American Arbitration Association, Of 
this group, 28 arbitrators volunteered — their 
participation, 

The union participants were 40 members in leader- 
ship or quasi-leadership positions who were charged 
with the responsibility for selecting ad hoc arbitra. 
tors and conducting arbitration cases for their unions, 
With the exception of one local union president, they 
had the titles of “regional director,” “district direc- 
tor,” and “international representative,” ‘The unions 
represented were those judged to have had the most 
frequent recourse to ad hoe arbitration in recent 
years, These unions were the Oil, Chemical, and 
Atomic Workers, the United Automobile Workers, 
United Mine Workers, United Papermakers and 
Paperworkers, and the United Steel Workers, 

The management participants were 33 industrial 
relations executives in Detroit-area companies which 
had had the greatest number of ad hoe arbitration 
cases with the above unions in recent years, Repre- 
sented in the industry group were automobile sup- 
pliers, steel producers, paper and container manufac- 
turers, and oil and chemical companies, 

All respondents volunteered after being given as~ 
surance that their responses would remain anonymous 
and that individual organizations would not be 
identified, 


Questionnaire 


A set of 60 statements was constructed which were 
descriptive of the manner in which an arbitrator 
might approach and conduct an ad hoe grievance 
case, The statements were culled from a wide range 
of published materials in the field of labor arbitra- 
tion and were judged to reflect the most important 
facets of arbitrator behavior. The list was then re- 
viewed by a small number of Detroit-area arbi- 
trators with view to climinating items that were 
duplicates, ambiguous or not too meaningful deserip- 
tions of behavior. On the basis of this review, the 
original set was pared to a list of 25 revised and 
edited statements. 

When placed in questionnaire form, each state- 
ment was accompanied by a Likert-type scale allow- 
ing an expression of attitude regarding the desirabil- 
ity of the behavior described, The seven-step scale 
ranged from the extreme role prescription “ossen- 
tial” to the extreme proscription “intolerable,” 

Bach arbitrator was requested to indicate “how 
desirable or undesirable you feel it would be for 
you, generally, to act in the manner described if you 
were free to structure the arbitration process with- 
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out restriction by companies and unions,” Each 
company representative and union official was asked 
to respond in terms of “how desirable or undesirable 
you feel it is, generally, for an arbitrator to act in 
the manner described.” 


Statistical Analysis 


Responses were scored on a 1 to 7 scale (“essen- 
tial” to “intolerable”) and product-moment correla- 
tions were computed among the 25 items for the 3 
groups of respondents combined, 

A cluster analysis technique (Fruchter, 1954), 
involving the computation of Beta-coefficients, was 
then applied to the resulting matrix in order to de- 
termine whether the total set of items could be re- 
duced to a limited number of relatively homogeneous 
categories suggestive of the underlying dimensions 
of the arbitrator’s role. The general procedure fol- 
lowed consisted of putting together clusters of inter- 
related items of “best fit.” That is, the items chosen 
to constitute a given cluster had to have consistently 
higher intercorrelations within the subset than cor- 
relations with other items in the total set. 

The naming of clusters followed a process of in- 
ferring the factor or dimension being measured by 
the items in each cluster that was absent from or 
diminished in the items not in that cluster, Because 
the naming process was one of logical inference, the 
results could be considered somewhat as hypotheses 
subject to verification by further investigation. 

Intergroup comparisons were made on the basis of 
cluster scores, consisting of the mean response to the 
items constituting a given cluster. In order to deter- 
mine the significance of the difference between the 
mean cluster scores of two groups, ¢ tests were used. 
Where the variances of the two groups were not 
equal, approximate values of ¢ significant at the .05 
and .01 levels were computed in the manner sug- 
gested by Cochran and Cox (1950), 


RESULTS AND DISCUSSION 
Role Dimensions 


The analysis of the intercorrelation matrix 
yielded five clusters of items (see Table 1). 
Two clusters contained five items each; two, 
four items each; and one was composed of 
three items. Four of the 25 items could not 
be placed in any cluster because their corre- 
lations with other items were neither consis- 
tently high nor uniform. 

The five items comprising Cluster I clearly 
suggest a dimension of “adherence to prece- 
dent.” The first two items deal with actions 
suggesting that published arbitration decisions 
are to be regarded generally as constituting 
a body of common law. While the third item 
concerns the need to maintain consistency in 
the same industry, the last two items pertain 
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TABLE 1 


IteEMs SCORED FOR EACH CLUSTER 





I Adherence to Precedent 
Treat published arbitration decisions as consti- 
tuting more or less a body of common law. 
When giving a written opinion, cite published 
cases which give support to the decision reached. 
Make sure that the decision rendered is consist- 


ent with those made by other arbitrators in 
identical cases in the same industry. 


Regard the primary objective of arbitration to be 
the development of precedents and principles for 
the future guidance of the parties. 


Make sure that the decision rendered is consist- 
ent with those made by other arbitrators in the 
same company. 


II Prophylactic Orientation 


Reflect in a decision not only what the contract 
says, but also its effect on the general health of 
the bargaining relationship. 


Extend the limits of consideration to a closely 
related issue if such action would be likely to 
preclude the necessity of a future arbitration 
case. 


Render a decision which considers the long-run 
impact on the collective bargaining relationship 
as well as the resolution of an immediate problem. 


In rendering a decision, give the parties hints or 
suggestions as to how they might avoid future 
disputes of the same nature. 


Consider the protection of individual rights be- 
yond their representation by company and union. 


to precedent and consistency in the same 
company. 

Four of the five items in Cluster II reflect 
concern with the future state of the collective 
bargaining relationship. The underlying di- 
mension, which has been termed “prophylac- 
tic orientation,” is evidenced in such phrases 
as “general health of the bargaining relation- 
ship,” ‘preclude the necessity of a future ar- 
bitration case,” “considers the long-run im- 
pact,” and “avoid future disputes.” The last 
item listed, although a clear choice for the 
cluster statistically, has no obvious logical 
relationship to the other four items in the 
grouping. 

The items in Cluster III relate to the man- 
ner in which the contract is construed and 
appear to reflect a dimension of “liberality of 
interpretation.” As will be noted, two of the 








III Liberality of Interpretation 


Treat the labor contract as more of a moral or 
ethical code than an “‘ordinary” legal document. 


Regard the contract as a flexible working agree- 
ment rather than a fixed set of rights and obliga- 
tions. 


Apply the contract in terms of evidence formally 
submitted regardless of whether the effects on 
the parties are good or bad. 


Approach an arbitration case with the sole ob- 
jective of solving a single, immediate problem. 
IV Elicitation of Facts 


Take the initiative in seeking out details which 
are not presented in formal submission. 


Suggest to a party not adequately represented by 
counsel how it might best present the facts of its 
case. 


Act as an investigator as well as a judge in order 
to insure a complete presentation of the facts of 
a case. 


Conduct a hearing in a manner that gives the 
parties ample opportunity to get things off their 
chests and release any tension generated by the 
dispute. 

V Procedural Formality 


Observe fairly strict rules of courtroom procedure 
in conducting a hearing. 


Request both pre- and posthearing briefs from 
the parties. 


Require strict adherence to rules of evidence and 
formality in the examination of witnesses. 


items present the view of the broad construc- 
tionist and two favor a narrow interpretation 
of the contract. (In obtaining cluster scores, 
the scoring of the last two items was 
reversed. ) 

Cluster IV, which appears to reflect an ac- 
tivity dimension, has been called ‘elicitation 
of facts.” The item content here concerns the 
initiative that may be exercised by the arbi- 
trator in bringing to light the facts under- 
lying a dispute during the progress of the 
hearing. Although the last item in this cluster 
deals more with catharsis than the presenta- 
tion of facts, it takes but a slight stretch of 
the imagination to interpret the phrase “get 
things off their chests” to mean the clarifica- 
tion of a party’s position as well as the ex- 
pression of emotion tied up with that position. 

The last item grouping, contained in Clus- 
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ter V, deals with the degree of formality or 
judiciality exercised by the arbitrator in his 
conduct of the hearing. The underlying dimen- 
sion has been called simply “procedural 
formality.” 

In interpreting the results of the cluster 
analysis, it is important to keep in mind that 
the role dimensions suggested by the data are 
necessarily a function of the original selection 
of items. Although the effort was made to rep- 
resent all important facets of arbitrator be- 
havior, it is conceivable that a somewhat dif- 
ferent questionnaire content might have 
yielded different clusters and suggested dif- 
ferent or additional dimensions. It is also nec- 
essary to consider the possible restrictions on 
the data arising from the nature of the re- 
spondent groups. Again, it is conceivable that 
a different dimensional structure might have 
emerged with larger samples of arbitrators, 
company representatives, and union officials. 
Consequently, the assumptions that the ob- 
tained dimensions are the basic ones and that 
they are stable should be tested in future 
research. 


Differences in Role Conceptions 


Having specified what appear to be the 
major dimensions of the arbitrator’s role, at- 
tention can now be directed to the second of 
the two questions posed. That is, to what ex- 
tent do the groups differ in terms of their pre- 
ferred role conceptions? Here the concern is 
with delineating the amount of objective role 
conflict in which arbitrators are exposed and 
examining the self-expectations of arbitrators, 
both as a matter of intrinsic interest and in 
relation to the preferences of the parties. 

Since the comparisons presented in the fol- 
lowing tables are in terms of cluster scores, 
each of which consists of the mean response 
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to the items in a given cluster, the interpreta- 
tion remains the same as that made of a re- 
sponse to a single item. That is, the numerical 
value of 1 represents the “essential” point on 
the scale and the value of 7 continues to be 
“intolerable.” The neutral point, where the 
behavior described is considered neither de- 
sirable nor undesirable, is represented by a 
value of 4. 

Table 2 presents the role expectations of 
the parties and the arbitrators with respect 
to the “adherence to precedent” dimension 
reflected in the items of Cluster I. It is ap- 
parent that both company representatives and 
union officials tended to prescribe, although 
not strongly, a common-law view of past arbi- 
tration decisions with its resultant consistency 
in decision-making. Although union officials 
were slightly more inclined to subscribe to 
precedent than were company respondents, 
the difference between the groups was not sta- 
tistically significant. Arbitrators, on the other 
hand, were significantly less favorable to the 
common-law view than were either of the 
parties, although the mean response of the 
arbitrator group was only slightly on the 
negative side of the scale’s neutral point. 

One interesting result of the analysis was 
the position of the union group. It might logi- 
cally be argued that unions would favor a 
flexible approach to arbitration unimpeded 
by adherence to precedent. The failure of 
union respondents to adopt this position may 
reflect a preference for the predictability en- 
gendered by the common-law approach as 
well as the uniformity in intercompany and 
interindustry practices which precedent tends 
to establish. Company spokesmen may also 
have favored adherence to precedent as a 
means of increasing predictability and stabil- 
ity in labor relations. 


TABLE 2 
MEAN ScoRES ON CLUSTER I: ADHERENCE TO PRECEDENT 





M SD t 
Companies (7=33) 3.62 1.16 Companies-Unions it 
Unions (n=40) 3.30 1.08 Arbitrators-Companies pede 
Arbitrators (n=28) 4.15 8 Arbitrators-Unions A5eF 
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TABLE 3 
MEAN Scores ON CLusTER II: PROPHYLACTIC ORIENTATION 
M SD t 
Companies (7=33) 4.78 1.16 Companies-Unions 6.76** 
Unions (7=40) 2.93 ey Arbitrators-Companies 2eScme 
Arbitrators (n= 28) 4.06 80 Arbitrators-Unions 4 i2"* 





+k <0. 


It also would have been quite reasonable 
to expect arbitrators to be more inclined to 
favor the development of a common law as a 
means of simplifying their tasks and bringing 
greater order to the adjudication of industrial 
disputes. That they did not adopt this posi- 
tion may in part be interpreted as a reaction 
to the charge of ‘“‘creeping legalism” that has 
been levied against the arbitration profession 
(The Arbitration Journal, 1958). Moreover, 
arbitrators may also prefer the latitude for 
independence in decision-making permitted 
in the absence of restrictive precedent. 

Table 3 presents the mean scores of the 
groups on Cluster II, containing items sug- 
gesting a dimension of prophylaxis or concern 
for the future of the bargaining relationship. 
The role expectations of the parties on this 
dimension were on opposite sides of the scale, 
with company representatives expressing op- 
position to the arbitrator extending considera- 
tion beyond the limits of the immediate case 
and union officials favoring the broader or 
more long-range view of the arbitrator’s func- 
tion. The position of the arbitrator group 
differed significantly from those of both par- 
ties, falling midway between the restrictive 
view of company representatives and the ex- 
pansive outlook of union officials. 

The conflicting expectations of the parties 
on this dimension conformed quite closely to 


statements of opinion typically found in the 
literature of labor relations. The union posi- 
tion suggests a view of the arbitrator as a 
labor relations physician, while the company 
position equates the role of the arbitrator with 
that of the courtroom judge. It would have 
been surprising if arbitrators had not adopted 
a neutral stance on this dimension as a com- 
promise on an issue which so clearly sepa- 
rated the parties. 

A significant difference between the com- 
pany and union groups was also obtained 
on Cluster III, containing items related to 
breadth of construction of the collective bar- 
gaining agreement (Table 4). The company 
position was one of strong opposition to the 
arbitrator making a liberal interpretation of 
the contract. Union officials, although also on 
the negative side of the scale, were signifi- 
cantly less opposed to broad construction 
than were company representatives. On this 
dimension arbitrators adopted a_ position 
quite close to that of union respondents and 
significantly less negative than that of 
company respondents. 

Genuinely surprising was the failure of the 
union group to express favor with a liberal 
interpretation of the contract. One gains 
from the literature the distinct impression 
that unions prefer a flexible and perhaps 
moralistic approach to contract interpreta- 


TABLE 4 


MEAN Scores ON CLusTER III: LipeRALiry OF INTERPRETATION 











M SD t 
Companies (7 =33) 5.64 Tals Companies-Unions 4.91** 
Unions (n=40) 4.22 1.33 Arbitrators-Companies 4 ORE 
Arbitrators (7=28) 4.55 10 Arbitrators-Unions 1.34 





** > < 01, 
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TABLE SS 
MEAN Scores ON CLustEeR IV: ELiciTATION oF FActTS 























M SD t 
Companies (7 =33) 4.31 1.04 Companies-Unions 6.10** 
Unions (n=40) 2.80 1.08 Arbitrators-Companies Dloe 
Arbitrators (m= 28) 3.78 .90 Arbitrators-Unions S.908* 
*p <.05 
** > < .01 


tion. Yet, the present data give little support 
to this contention. By way of interpretation, 
it is suggested that the union position repre- 
sents not so much a rejection of moral con- 
siderations as a requirement for a firm state- 
ment of member rights which cannot be 
altered and weakened by the vagaries of indi- 
vidual arbitrators. On this dimension, also, 
arbitrators appeared to take a compromise 
position. 

Table 5 compares the expectations of the 
three groups on Cluster IV, pertaining to the 
initiative exercised by the arbitrator in elicit- 
ing the facts of a given case and clarifying 
the positions of the parties. While union 
officials were fairly strong advocates of an 
active, fact-finding arbitrator, their manage- 
ment counterparts were prone to favor a more 
restrictive view of the arbitrator’s role. 
Arbitrators again took the middle position, 
although, like union officials, they were on 
the prescriptive side of the scale. 

These data suggest that company represen- 
tatives, who frequently have training in labor 
law or have ready access to members of the 
legal profession, tend to feel relatively com- 
petent in presenting their side of the dispute 
and do not look with favor on the potential 
benefits to the union of an arbitrator who 
at times departs from a strictly judicial role. 
Union officials, on the other hand, perhaps 


less knowledgeable about the fine points of 
labor law and financially restricted from the 
free use of legal aid, are quite receptive to 
the assistance that might be provided to 
them by the arbitrator in bringing out the 
full merits of their cases, 

Cluster V, reflecting the dimension of pro- 
cedural formality, further separated the com- 
pany and union conceptions of the arbitra- 
tor’s role. As illustrated in Table 6, company 
representatives were inclined to favor formal- 
ity in the conduct of arbitration hearings, 
while union officials were prone to reject this 
form of legalism. On this dimension, arbitra- 
tors joined the union group in expressing 
opposition to formally conducted arbitration 
cases, 

In considering the meaning of these data, 
per se and in conjunction with the preceding 
analysis, it would seem that company repre- 
sentatives feel a bit more comfortable in an 
arbitration setting which is judicial in 
character, while their opponents conceive of 
the arbitration process as a problem-solving 
one. The self-expectations of arbitrators on 
this dimension, viewed in light of their nega- 
tive stance on the matter of adherence to 
precedent, might be a further reflection of 
sensitivity to the legalism charge. 

A final point of interest concerns the re- 
sponse variability of the three groups. In view 


TABLE 6 


MEAN Scores ON CLUSTER V: PROCEDURAL FORMALITY 














M SD 
Companies (n=33) 3.66 .90 
Unions (n=40) 4.47 1.07 
Arbitrators (n= 28) 4.69 te) 


Companies-Unions Sone" 
Arbitrators-Companies 4,92** 
Arbitrators-Unions 1.02 


ee eee EEEIEEIEEE UNITE UES 


KD <.01. 
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of the limited variance of the cluster scores 
of the arbitrators and their proximity to the 
mean of the scale, it would be reasonable to 
conclude that arbitrators demonstrated little 
willingness to prescribe or proscribe for them- 
selves definitive courses of action, By contrast, 
company and union representatives were far 
more apt to express definite views about how 
arbitrators should play their role—views 
which in large measure were conflicting. 
Consequently, it might well be expected that 
arbitrators, perceptive of these conflicting 
expectations, would find it desirable to take 
a conservative, middle-of-the-road view of 
their role. Since the tenure of an arbitrator 
is contingent upon his acceptability to both 
of the parties, alignment with one on any 
issue dividing them is necessarily fraught 
with danger. 

It is also possible, if not likely, that the 
greater variability of the company and union 
respondents was in part due to the hetero- 
geneity of these groups. As noted earlier, 
five different industrial groupings were 
sampled. Should there be interindustry and 
interunion differences, as well as differences 
between the parties, they would be manifested 
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in response variance. However, this ques- 
tion remains to be answered by subsequent 
research. 
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PROGRAMED INSTRUCTION AS A TECHNIQUE FOR 
IMPROVING SPATIAL VISUALIZATION * 


ERWIN H. BRINKMANN 


Southern Illinois University, Edwardsville 


This study investigated the feasibility of using a specially designed self- 
instructional program to teach the visualization of space relations. A 505-item 
program, using selected concepts of geometry to help condition the classes 
of behaviors specified as components of the visual-spatial functions, was 
administered to a group of 27 8th-grade pupils; a carefully matched control 
group, receiving only the pre- and posttests, continued with its regularly 
scheduled mathematical classwork presented in the conventional manner. 
Results indicated that the Ss receiving the program scored significantly (p 
< .001) higher than the control group. It was also indicated that the attitudes 
of the learner may be an important factor in the effectiveness of programed 


instruction. 


Programed instruction, based on principles 
derived from carefully controlled laboratory 
studies, provides the potential for a more sci- 
entific approach than is customarily found in 
educational pedagogy. It is not surprising 
then that this technique, when properly uti- 
lized, can help students learn materials al- 
ready being taught through other media. 
What may be more revealing about the ef- 
fectiveness, or power, of this particular tech- 
nique is to utilize it in areas which have not 
been effectively taught with any degree of 
consistency. One such area, in which the effec- 
tiveness of educational development through 
normal curriculum offerings has been ques- 
tioned, is the so-called aptitude variously 
referred to as “spatial visualization” or “visual 
space relations.” 

Essentially, this article describes an investi- 
gation of the effect of training by means of a 
self-instructional program designed to teach 
spatial visualization on subsequent perform- 
ance on a space relations measure. While 
recognizing the controversy over the innate 
versus acquired nature of perceptual ability, 
the thesis adopted for this study is, while 

1 This article is based on a PhD thesis presented 
to the Horace H. Rackham School of Graduate 
Studies, University of Michigan, May 1963. The re- 
search was conducted as part of Cooperative Re- 
search Project No. 1474, supported through the 
Cooperative Research Program of the Office of Edu- 
cation, United States Department of Health, Educa- 
tion, and Welfare. Appreciation is expressed to Fin- 
ley Carpenter, Project Director, for his contributions 
to the study. 


certain components of perceptual organization 
may be influenced by genetic endowment, 
other factors in visual perception (e.g., dis- 
crimination, judgment, etc.) can be influenced 
by learning. Given normal physiological equip- 
ment, a child probably will learn to perceive 
gross relationships fairly accurately because 
his very survival requires reactions to the ob- 
jects in his environment. However, in the ab- 
sence of guidance and the assistance of in- 
struction, there may be numerous deficiencies 
in this general function of perception and 
especially in the more complex process of 
detailed spatial visualization. Thus the need 
for serious attention to the deliberate cultiva- 
tion and training of this function becomes 
evident. 

Since programed instruction has shown 
promise of being a powerful technique for 
producing behavioral changes, its effective- 
ness as an instructional technique for increas- 
ing the skill of an individual in spatial visuali- 
zation should be carefully assessed. Involved 
in such an undertaking are the specification 
of behaviors composing the criterion task, the 
development of an instructional program to 
shape the specified behaviors, and the ulti- 
mate evaluation of the effectiveness of such a 
program in shaping the desired behaviors. 


PROCEDURE 


Program Development 


Creating a program in any one phase of ability 
such as spatial visualization means taking on the 
task of specifying in precise behavioral language 
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what the perceiver does. For an area as poorly de- 
fined as spatial visualization, this is admittedly a 
difficult task. However, for operational purposes, the 
criterion measure may be assumed to constitute the 
goal specifications. To ascertain the nature of the 
behaviors involved in achieving the goal behavior, a 
number of subjects (Ss) were asked to “verbalize 
their thoughts” as they selected their responses. By 
contrasting the approach of the successful perform- 
ers with the less successful performers, several dis- 
crete behavioral components were indicated. Among 
the more clearly defined of such behaviors were the 
use of estimation and sensible approximations in the 
following: (a) differentiation or discrimination, (0) 
identification (recognition and labeling), (c) organi- 
zation or relationship, and (d) orientation. 

The content selected to develop the aforemen- 
tioned behaviors was a short course in elementary 
geometry. Among the topics included were the basic 
elements of point, set, line, line segment, ray, angle, 
simple plane figures, and simple solids. It must be 
realized that the usual approach to the teaching of 
geometry has not added significantly to performance 
on spatial visualization tests (Brown, 1954; Ra- 
nucci, 1952). However, when one realizes that the 
emphasis in the teaching of geometry is usually on 
development of formal proofs based on certain types 
of “givens,” the failure to add to the performance on 
spatial visualization measures is not surprising. The 
behaviors demanded are simply different. Hence, the 
approach taken in this program stressed problem 
solving employing the behaviors required in the task 
rather than manipulation of abstractions through a 
process of logical reasoning. 

To develop the learner’s skill in discrimination— 
in making more effective use of sensible approxima- 
tions—the program arrangement initially called for 
easily discernible discriminations and identifications 
and gradually proceeded to tasks requiring increas- 
ing precision in discrimination. To portray the con- 
ceptual framework of this strategy in a lucid man- 
ner, a series of graphical structures similar to the 
“lattice structure” (Woolman, 1962) was utilized as 
the organizational tool. Although these graphical 
structures were never seen by the students, they 
proved invaluable in developing the program itself. 
A total of 15 such graphical structures, or charts, 
were formulated during the program development 
stage. Each chart described in annotated visual form 
the strategy to be followed in writing a particular 
section or spiral of the program by indicating rela- 
tionships between concepts as well as the sequence 
for presenting the various concepts. Considered in 
this progression is the complexity of concepts in- 
volved, the behavior repertoire to be developed, and 
the density of stimulus support provided in succes- 
sive steps. ; 

To establish a given abstraction or concept, ex- 
tensive discrimination training was provided. Vari- 
ous examples were generated so that having had 
a response called for in a variety of contexts would 
enable the learner to discriminate the concept or 
abstraction over as wide a range as possible. To 
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provide greater opportunity for extensive discrimi- 
nation training, a series of panels in a separate 
booklet and an object kit, to be used as directed in 
the verbal program, were provided for the learner. 
The panels consisted of drawings of various simple 
geometric designs, as well as cutout patterns of 
geometric solids. The object kit contained geometric 
solids in the form of cubes, rectangular solids, pyra- 
mids, cones, and variations, or combinations of 
these. Both the pattern-folding and _ solid-object 
manipulation were designed to provide tactual- 
kinesthetic as well as visual feedback for the learner 
in his discrimination exercises. 

After preliminary tryouts and revisions, the final 
version of the program consisted of 505 frames. To 
facilitate the presentation of the self-instructional 
program, the materials were organized into 10 units 
varying from 50 to 53 frames each. Each of the 
units was compiled into a separate booklet of 10 or 
11 pages. The program was linear in design, required 
both constructed and multiple-choice responses, and 
utilized the horizontal format with five levels per 
page. 


Experimental Procedure 


The study was conducted in a nearby public school 
which provided a representative sample of students 
from low- to middle-income families and a coopera- 
tive teaching and administrative staff. Two eighth- 
grade classes, with approximately 30 students (Ss) 
each, were arbitrarily assigned to the experimental 
and control treatments. The former group received 
the instructional program in addition to the pre- 
and posttests, while the control group received only 
the tests. 

For each individual in the experimental group, a 
matching control S was designated in the control 
group. In the matching procedure, precedence was 
given to pretest scores on the Space Relations (Form 
A) of the Differential Aptitude Tests (Bennett, Sea- 
shore, & Wesman, 1959) as well as sex and grade 
level. Also given careful consideration in the match- 
ing procedure were intelligence and seventh grade 
mathematics grades. In this manner an experimental 
group and a control group consisting of 27 matched 
pairs (13 female and 14 male) were formed. 


Data Collection Procedures 


Efforts were made to collect data under condi- 
tions corresponding as closely as possible to the 
school’s regular program in order to minimize any 
possible Hawthorne effect (French, 1953). Both the 
pretest and_posttest Space Relations were admin- 
istered to the group as a whole in a large auditorium 
with one of the school’s counselors in charge and the 
teachers of the classes involved acting as proctors. 
All remaining data were collected in the normal 
classroom setting. To ascertain the background 
knowledge of program content already possessed by 
Ss, the teachers in both the experimental and con- 
trol groups administered a pretest consisting of 
sample items from the instructional program and 
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TABLE 1 
Error DATA FOR PROGRAM, Units I-X, For 27 ErcHtH-GRADE STUDENTS 











Number of errors 


Error rate 








Number of Response (% of total possible) 
unit units® Range M Mdn M Mdn. 
I 50 0-12 4,25 Ses 8.5 7.0 
II 55 0-16 4.57 4.0 8.3 7.3 
III 50 0-10 3.68 a1 7.4 7.0 
IV 50 0-11 Sui 3.0 7.4 6.0 
V 50 0-18 5.78 4.5 11.6 9.0 
VI 50 0-18 Sead, 5.0 10.6 10.0 
VII 50 0-16 Sih) 4.5 ES 9.0 
VIII 53 0-20 4.89 5.0 9.2 9.4 
Ix 50 0-17 4,74 4.0 9.5 8.0 
x 52 1-17 6.11 55 11.8 10.6 
All 10 Units 510 
(27 Ss) 8-117 48.07 42.5 9.4 8.3 





a Each frame was considered a response unit except in Unit II where several review items, placed into a single frame, were 


considered as separate entities. 


referred to as the Geometric Inventory. Hereafter, 
the control class continued with the regular mathe- 
matics course. The teacher in the experimental group 
introduced the class to programed instruction and 
explained the practices to be followed in using the 
materials, Each S was supplied with the first pro- 
gramed unit and an answer sheet. In addition to 
recording his responses, he was instructed to record 
his starting and completion time, plus interruption 
and resumption time if only partially finished at the 
end of a class period, on the answer sheet for each 
unit. Students proceeded to work individually 
through the program at their own pace. All remain- 
ing units plus supplementary materials were placed 
on a table at the front of the classroom. Here the 
student could select his lesson materials as they 
were needed. A few minutes before the end of each 
class period the teacher alerted Ss to indicate the 
time on the answer sheet and to place their lesson 
materials into their individual folders in a filing 
cabinet. This procedure was already established, 
having been followed in their mathematics class. 
In this way S could resume where-he had left off 
without difficulty; also, all of the materials remained 


within the classroom. The experimental procedure 
was restricted to the class periods normally devoted 
to mathematics over a period of approximately 3 
weeks. It was administered throughout by the teacher 
who was normally responsible for teaching the class. 
Having the experimental condition extend over a 
longer period of time was expected to depreciate 
any novelty effect which might have existed after 
only brief exposure to a new instructional technique. 

After the Ss had completed the programed in- 
structional materials, the posttest Space Relations 
was administered to the whole group in the manner 
described earlier. Following completion of the princi- 
pal criterion test, the posttest Geometric Inventory 
was administered by the classroom teacher during 
the next class session to provide a basis for judging 
the effectiveness of content learning attributable to 
the program. The final data-gathering device, pre- 
sented to the experimental group only, was a ques-», 
tionnaire designed to reflect the attitude of the 
individual toward the programed instruction just 
experienced. This was administered after completion 
of the instructional program and before results of 
any posttest measures were revealed to check the 


TABLE 2 


PRETEST AND PostrEsT RANGE, MEAN, AND STANDARD DEVIATION OF 
SCORES ON THE Geometric Inventory 








Number of Pretest Posttest 
responses 
Group Ne possible Range M s Range M 5 
Cann—ieS s 000 2h Res 2s i ee SS eS SS 
Experiment 25 65 6-30 19.56 6.05 31-63 50.84 8.83 
Control 25 65 6-31 21.52 7.08 3-39 24.32 7.82 


OOO wwma=s— 


& Two absentees in control group reduced pairs to 25. 
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TABLE 3 


COMPARISON OF THE EXPERIMENTAL AND CONTROL Groups ON PREeTEST-POsTTEST DIFFERENCES 
ON GEOMETRIC INVENTORY SCORES 








Group N M Diff. s Saas t df Level Sig. 
Experiment 25 31.3 25; 
Control 25 2.8 97 de2ia 22.4 24 .001 
a Standard error of difference for matched pairs. 
possibility that a relationship might exist between the Table 2. While control group scores re- 


effectiveness of this particular technique of learning 
and the S’s attitude. 


RESULTS 


This investigation sought information on 
two primary questions: (@) did the program 
succeed in conditioning the specific terminal 
behaviors (content learning); and (0) did 
the program have an effect on the larger 
repertory of discriminative behaviors (visual 
space relations) of which the specific terminal 
behaviors were considered a subset? 


Content Learning 


In addition to being a measure of the 
program’s difficulty, the error rate may also 
be one of the criteria by which the quality 
of the program may be evaluated. A sum- 
mary of the error data is found in Table 1. 
For all 10 units of the program combined, the 
error range varied from 8 to 117 per individ- 
ual with a median of 42.5 errors or 8.3 percent 
missed out of a possible 510 errors for the 
entire program. On the separate units the 
error rate ranged from a low of 7 percent on 
Units I and III to a high of 10.6 percent on 
Unit X, indicating a relatively consistent error 
rate. 

As a measure of content learning, the 
Geometric Inventory showed the relative 
gains of the two groups as indicated in 


mained relatively constant, experimental 
group posttest scores showed no overlap with 
the pretest range of scores. 

A comparison of the pretest-posttest mean 
differences (Table 3) showed the experi- 
mental group performing significantly better 
than the control group. 


Space Relations Performance 


To examine the effect of the program on 
the larger repertory of discriminative be- 
haviors, the performance of the two groups on 
the posttest Space Relations was examined. 
To make such a comparative evaluation 
implies initial equivalence of the groups. 
Actually, the pretest scores on Space Rela- 
tions (Form A) were almost identical with 
mean scores of 36.07 and 36.04 for the con- 
trol and experimental groups, respectively. 
On the posttest the experimental group mean 
was 54.22 compared to a control group mean 
of 39.26. The difference in the gains is 
significant at the .001 level. 

To examine the varying effects which an 
instructional program may have on learners at 
several points on the initial performance con- 
tinuum, the sample was divided into quarters 
on the basis of pretest Space Relations Test 
scores, and the resulting mean scores, both 
pretest and posttest, for each quarter were 


TABLE 4 


COMPARISON OF THE EXPERIMENTAL AND CONTROL GROUPS ON PRETEST-POSTTEST DIFFERENCES 
ON SPACE RELATIONS TEST 














Group N M Diff. Ss Sx xen t df Level Sig. 
Experiment ae 18.18 14.6 
Control Di. 3.18 10.8 3.41 4.4 26 .O1 








® Standard error of difference for matched pairs, 
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compared. For the upper quarter, both groups 
showed almost identical gains. Both groups in 
the lower quarter showed improvement with 
the experimental group showing a slightly 
higher gain. For the middle half (second and 
third quarters), the control Ss showed the 
relative stability from pretest to posttest im- 
plied by the reliability coefficient. In contrast 
to the stability of the control-group perform- 
ance, the middle half of the experimental 
group showed marked gains from pretest to 
posttest (average gain of 23 raw-score points). 


Attitude Questionnaire 


Analysis of the attitude survey revealed 
two items receiving diverse reactions. Those 
Ss who felt that teachers could teach much 
better than a program had more consistently 
scored below the median on the posttest 
Geometric Inventory as is shown in Table 5. 

On the other hand, the majority of those 
Ss indicating a preference for only occasional 
or no utilization of programed learning were 
found to have scored below the median (see 
Table 6). 

Responses to the remaining items on the 
attitude survey were generally in agreement. 
Most of the Ss agreed that programed learn- 
ing was a good way to learn, that it was 
challenging, and that they could learn a 
great deal by using a program. Nearly all Ss 
disagreed with the statements that they did 
not have to think when learning with a pro- 
gram or that programed learning was a boring 
method of learning. It was interesting to note 
that not a single S considered the program 


TABLE 5 


RELATIONSHIP BETWEEN PERFORMANCE ON THE Post- 
TEST GEOMETRIC INVENTORY AND EXPRESSED 
ATTITUDE TOWARD STATEMENT THAT TEACHERS 
Can Teaco Mucu BETTER THAN A PROGRAM 








Attitude 
Score on post- 


test geometric Uncertain 





inventory Agree or disagree Total 
Above median 2 12 14 
Below median 9 5 14 
Total 11 17 28 





ee 


Note.—Chi-square = 5.40 (continuity correction used); 
significant at .025 for 1 df. 
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TABLE 6 


THe RELATIONSHIP BETWEEN PERFORMANCE ON THE 
Postrest GEOMETRIC INVENTORY AND EXPRESSED 
DESIRE FOR DEGREE OF USAGE OF PROGRAMED 
INSTRUCTION AS AN INSTRUCTIONAL INPUT 





Degree of usage 
Score on post- 





test geometric Ex- Little or 

inventory tended none Total 
Above median 8 6 14. 
Below median 2 12 14 
Total 10 18 28 





Note.—Chi-square = 3.88 
significant at .05 for 1 df. 


(continuity correction used); 


too difficult and at the same time no one felt 
that the whole program was too easy. 


Discussion 
Content Learning 


A further examination of some possible 
influencing factors on the learning activities 
is in order. A median error rate of 8.3 percent 
suggests that the program as a whole was not 
too difficult. That learning of the program 
content occurred is indicated by the absence 
of overlapping of the posttest and pretest 
score distribution for the experimental group. 
There was, however, no reduction in the range 
of scores on the posttest. By dividing the 
experimental group into quarters on the basis 
of pretest scores on the Geometric Inventory, 
it was found that the mean gain (posttest 
mean minus pretest mean) for each quarter 
was nearly identical. In other words, initial 
advantages were maintained but gains at- 
tributable to the program were relatively 
constant. One cannot infer from this finding 
that learning differences were eradicated. 
Surely quality of performance cannot be 
considered equivalent when one considers 
such factors as ceiling effect (variance was 
5.8 for the upper quarter compared to 80.5 
for the lower) and tendencies for regression 
toward the mean. Nevertheless, further study 
to explore this phenomenon is suggested. 


Space Relations Performance 


On the basis of raw scores, there were no 
appreciable differences attributable to sex on 
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the posttest Space Relations, although girls 
did score approximately two raw-score points 
higher than boys. This would suggest that 
girls can at least hold their own when pro- 
vided with the opportunity to learn something 
about a particular area in which they are 
often assumed to possess less ability. In view 
of the significantly greater gains exhibited by 
the experimental group, it appears reasonable 
to assume that the functional skill of indi- 
viduals in spatial visualization can be im- 
proved when appropriate training is provided. 
These results support the finding of Van 
Voorhis (1941), who, while not using pro- 
gramed instruction, reported that, by em- 
phasizing such behaviors as estimation and 
visualizing in the training process, significant 
improvement in performance on a spatial per- 
ception measure could be achieved. Both find- 
ings may be interpreted as suggesting the 
need for specificity in training, especially 
when results of these two studies are con- 
trasted with negligible effects reportedly re- 
sulting from routine course offerings such as 
geometry (Brown, 1954; Ranucci, 1952) or 
mechanical drawing (Mendicino, 1958). The 
feasibility of developing greater specificity in 
training techniques through a precise behav- 
ioral analysis of the task components and the 
utilization of programed instruction as a 
medium for carrying out such a carefully 
design instructional prescription is indicated. 


Attitudinal Influences 


A simple explanation for the lower scoring 
Ss’ expressed preference for a teacher might 
be that poorer Ss do have a greater need for 
teacher support and hence could be expected 
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to feel that teachers could teach better than 
a program. Over-reliance and dependence on 
a teacher, whose available time for each S is 
necessarily limited, also may have been con- 
tributing factors to their generally poorer 
academic record. 

A general reluctance to use programed in- 
struction for an extended period of time ap- 
peared to result from a dislike of references 
to sources aside from the programed booklet, 
for example, a panel or an object. These 
results suggest caution in developing long 
and complex programs for relatively immature 
learners. 
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Data concerning various aspects of female clerical workers’ job satisfaction 
and group productivity were gathered from the employees of 300 catalog order 
establishments. Measures were also obtained of the prosperity, unemployment, 
slums, productive farming, and decrepitude of the communities in which 
the catalog order establishments were located. Analysis of these data indicated: 
(a) average satisfaction scores and group productivity were unrelated in gen- 
eral, (b) satisfaction scores were negatively related to the prosperity of the 
community, and (c) pay satisfaction scores tended to be more negatively 
related to the prosperity of the community than did the other aspects of 
job satisfaction. An explanation of these findings in terms of frames of reference 
and alternatives available to the workers is offered. 


The influence of community characteristics 
on job satisfaction and job performance has 
been documented empirically by Cureton and 
Katzell (1962), Katzell, Barrett, and Parker 
(1961), and Kendall (1963), and has been 
discussed by Hulin (1963b) and Worthy 
(1950) in somewhat more speculative papers. 
Katzell et al. (1961) and Cureton and Kat- 
zell were interested in the possible use of 
community characteristics as moderator varia- 
bles. That is, variables which would serve to 
moderate the direction and strength of the 
relationship between job satisfaction and cer- 
tain behavioral variables such as job perform- 
ance, absences, and turnover. Katzell et al. 
(1961) found, for example, that the average 
group satisfaction scores and the group pro- 
ductivity of warehouse workers were positively 
related to each other. They regarded this posi- 
tive correlation as dependent on the relation- 
ship of the two variables to certain situational 
characteristics. In their sample of warehouses 
both satisfaction and productivity were nega- 
tively related to size of work force, city size, 
and degree of unionization. These complex 
relationships appeared more clearly in an 
oblique re-rotation of the two centroid factors 
which they had extracted originally from their 
set of situational measures. Cureton and Kat- 
zell (1962) concluded that the nonurban cul- 
ture pattern which had originally been dis- 


1 The writer would like to express his appreciation 
to Lauren Miller and Karen Sauln for their assist- 
ance in the data analysis stages of this study. 
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cussed by Katzell et al. may be best thought 
of as being made up of two positively corre- 
lated aspects, one reflecting a small plant and 
small community syndrome and the other re- 
flecting a female employee syndrome. Both of 
these factors were related to the job satisfac- 
tion and job performance of the work groups. 
Their explanation for this finding was that in 
small-town cultures the needs and expecta- 
tions of the workers are such that the workers 
view high productivity as a means to the de- 
sirable end of high rewards. Katzell et al. also 
state that the nature of the retailing industry 
affords conditions for positively correlated 
performance and satisfaction varying with 
employee motivations. It is of interest to note 
that job performance (behavior in the job 
situation) was unrelated to turnover (behavior 
directed toward leaving the situation). 
Kendall (1963), however, was concerned 
with using community characteristics to index 
frames of reference of the workers and the 
alternatives available to them in the com- 
munity. With this as his theoretical frame- 
work, Kendall used canonical regression to 
analyze the data obtained from a nation-wide 
study of job and retirement satisfaction. His 
data were gathered from an initial sample of 
1,008 male workers drawn from 21 different 
plants, a replication sample of 1,002 male 
workers and a generalization sample of 642 
female workers from the same sample of 21 
plants. He found that measures of satisfaction 
with various aspects of the job bear zo rela- 
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tionship to measures of performance and 
absences even under conditions designed to 
maximize such relationships (canonical corre- 
lations). He found, however, that ‘“‘satisfac- 
toriness” (high performance and low absence 
rates) was related to personal background 
variables; high absence rates were related to 
unattractive community features; and high 
performance was related to personal back- 
ground and unattractive community features. 
More importantly (for the purposes of this 
study) he found that high general job satis- 
faction, high satisfaction with the pay re- 
ceived, and high satisfaction with the work 
done on the job were related to unattractive 
community features.” 

Worthy (1950) discussed in a speculative 
and qualitative paper many of the relation- 
ships which Katzell et al. obtained in their 
sample of warehouse work groups. He was 
mainly concerned with the interrelationships 
between morale, performance, and situational 
characteristics for employees of a large retail- 
ing establishment. Still along the same lines, 
Hulin (1963b) presented a model which 
utilized both plant variables (i.e., size, union 
management relationships, wage rate, etc.) 
and community characteristics (urban-rural 
dimensions, unemployment, etc.) as mod- 
erator variables which should serve to mod- 
erate the relationship between job satisfac- 
tion and job performance. In this model it 
was hypothesized that community character- 
istics and the personal characteristics of the 
workers would exert their strongest affect on 
the relationship between satisfaction and be- 
havior directed toward leaving the situation 
(turnover, absences, lateness) while plant 
characteristics would exert their strongest 
affect on the relationship between satisfaction 
and behavior in the situation (job perform- 
ance), 

It has been stressed on numerous occasions 
that job satisfaction must be considered as a 
feeling which has arisen in the worker as a 
response to the total job situation. In addi- 
tion to being related to the present job situa- 
tion, this feeling is associated with perceived 
differences between what the worker expects 


2Only those relationships which were significant 
in all three samples are discussed. 
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for his services and what he actually experi- 
ences in relation to the alternatives available 
to him. The Cornell Studies of Job Satisfac- 
tion, begun in 1959, represent one intensive 
program of research developed within this 
framework (See Hulin & Smith, 1965; Smith, 
Kendall, & Hulin, in press). These investi- 
gators have utilized variables such as age or 
tenure, and education to index changing ex- 
pectation-experience discrepancies and com- 
munity variables to index the alternatives 
open to the worker and to index the frame of 
reference established by the community. The 
results of these studies have generally been in 
the expected direction if the problem of non- 
equivalence of measures is considered. If these 
formulations of job satisfaction are correct, 
and if the results obtained have generality to 
other jobs, other workers, and other situations, 
then two predictions could be made regarding 
satisfaction and community characteristics. 

1. Measures of job satisfaction should be 
associated with community variables which 
reflect the prosperity, the extent of the slums, 
the amount of productive farming, and the 
amount of unemployment in the area. The di- 
rection of this association should be that more 
attractive community features lead to lower 
satisfaction values. This prediction stems di- 
rectly from the conviction that a worker’s 
feelings of satisfaction do not arise out of con- 
text. Rather, the worker evaluates his present 
position in the context of the alternatives 
open to him. If he lives in a slum, in a poor 
community, or in a community in which there 
is a great deal of unemployment, even if he 
has a relatively poor job, he is probably better 
off than any of his neighbors. Essentially, in a 
slum there are no alternatives that offer a 
better life. This same worker in a prosperous 
community would be relatively less well off. 
We would expect his satisfaction to be lower 
also. This is not to say that the residents of a 
slum or a poor community are more satisfied 
than they would be if they lived in a pros- 
perous community. We are concerned only 
with workers who live in slums in comparison 
to residents of prosperous communities who 
have similar jobs. 

2. Measures of a worker’s satisfaction with 
his pay should be more strongly associated 
with community characteristics than are the 
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other aspects of job satisfaction. In a pros- 
perous community a worker’s pay level is made 
very clear to him by the goods and services 
purchased by other members of the commun- 
ity as compared to what he is able to pur- 
chase. Feelings toward this aspect of a work- 
er’s job should be affected more strongly by 
the community characteristics since his pay 
level and his identification as “‘an employee of 
Company X” are the only aspects of the job 
that the worker must take with him when he 
leaves the plant gate. 

The present research was designed to assess 
the validity of these two predictions when 
applied to groups of white collar (sales) per- 
sonnel employed by a large retail sales organi- 
zation. This research design is similar to the 
Katzell et al. study in that only employees 
from one company are studied and group 
satisfaction and performance measures were 
used. The present research has the added ad- 
vantage of controlling to a great extent the 
urban-rural dimension whose affects were dis- 
cussed by Katzell et al. This research should 
then be considered as a replication and exten- 
sion of the work first reported by Kendall 
(1963) and represents a more detailed ex- 
ploration of specific relations suggested only 
tentatively by the previous work. Both of the 
hypotheses as well as the methods of indexing 
community characteristics can be traced to 
his dissertation. Thus, while there are differ- 
ences in approach, this research fits into the 
framework of the Cornell Studies on Job Sat- 
isfaction. 


METHOD 
Research Setting 


This study was conducted in a large merchandising 
and retail company. This company establishes a 
regular retail outlet store in communities which are 
large enough to support such an enterprise. The 
company has a large number of catalog order estab- 
lishments (COEs) in addition to these retail stores. 
These COEs are located in small communities which 
are not large enough to support the operation of a 
regular retail outlet store. The function of these 
offices is to provide outlets for the store’s merchan- 
dise through sales catalogs and the employees are 
mainly to provide assistance in completing catalog 
orders and to perform the usual routine duties of 
sales personnel. The personnel of these COEs con- 
sists of a supervisor and a number of female sales 
persons, the size of the staff varying with the demand 
in the area. The average number of employees at 
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these offices is about 6.5 with a standard deviation 
of about 3.8. 

The data to be reported in this study were gath- 
ered from 300 of these COEs. This sample of 300 
COEs represented a geographically stratified random 
sample of the population of the COEs operated by 
the company. The COEs were stratified on the basis 
of the home store to which they were attached and 
30% of the COEs from each area were drawn. 
Within the company structure many of the day-to- 
day decisions are left to the discretion of the local 
managers. Thus, while the company policies under 
which the employees work are constant throughout 
the sample, the store level practices may be some- 
what different depending on the store manager. 


Community Characteristics 


The number of community variables which could 
reasonably be measured in a study of this type is 
too large to be handled efficiently by any model 
relating satisfaction to productivity and/or situa- 
tional variables. A taxonomy of community vari- 
ables developed by Kendall (1963) from a principal 
component analysis of the intercorrelations of 55 
per capita census variables from 370 countries origi- 
nally presented by Johnson (1958) was employed. 
As far as possible the variables suggested by Kendall 
to index the community dimensions were used. No 
claim is made that this taxonomy is the way to de- 
scribe population units. These variates were chosen 
because, for the purposes of this study, they seemed 
to be measuring those aspects to the community 
which would be most salient to the workers living in 
the area. Although there are always problems in- 
volved in using the results of an analysis not de- 
signed to answer the questions must crucial to your 
problem, the savings in clerical labor, money, and 
time are sufficient to enable one to make the com- 
promise. The first three variates described below 
are intended to index the general economic situation 
of the community. The third variable, unemploy- 
ment, was included to index job opportunities in the 
community. The last two variates were included for 
general interest value. 

Values for each of the community characteristics 
to be described below were taken from the publica- 
tions of the United States Bureau of the Census 
(1962) or United States Department of Commerce 
(1963). In the case of “degree days” the value for 
the adjacent county was taken if there was no 
weather station established in the county in question. 

Slums. This variate is indexed by “per cent non- 
white” and “per cent owner occupied housing” 
(reversed scoring). 

Prosperity. The general prosperity of the com- 
munity is indexed by “median income per family,” 
“per cent earning over $10,000,’ “per cent sound 
housing,” and “per capita retail sales.” 

Productive farming. This aspect of the commun- 
ity’s economic condition is indexed by “median rural 
income per family.” 

Unemployment. The amount of unemployment in 
the community is measured by the percentage of 
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the population over 14 who were not “at work” but 
were looking for work (in 1960). 

Decrepitude. This variate is indexed by “per cent 
sixty-five and over” and “per capita heart deaths.” 
Decrepitude was included in this analysis in an at- 
tempt to obtain a variable which would be likely 
to make a community a relatively unpleasant place 
in which to live and which should be independent 
of economic variables. 

Northern male work force. This aspect of the com- 
munity is measured by “per cent male workers” and 
“degree days.” This variate was included to give 
some indication of the industrially oriented northern 
communities as opposed to the southern communities 
with their more pastorale orientation. 


Office Variables 


Several variables related to the immediate work 
environment in the COE were obtained. These vari- 
ables, which are described below, were included in 
order to assess the amount of business handled, the 
efficiency of the office, and the number of employees 
working at the office. In all cases the measures of 
these variables were taken from the company records. 

Gross demand (1962). The gross demand made on 
any office is obtained by assessing the worth of the 
orders handled by the COE during 1962 (the year 
during which the morale survey was made). While 
this quantity is an accurate reflection of the volume 
of business done by the office, it is not a reflection 
of the efficiency of the work force. 

Percentage of Increase in Gross Demand 1961- 
1962; Payroll and social security taxes/gross demand. 
Payroll and social security taxes divided by gross 
demand was included as an assessment of the rela- 
tive efficiency of the staff of the COE (group produc- 
tivity). Only by handling a greater volume of orders 
with the same staff or reducing the staff for a given 
volume of orders could the manager of the office 
change the value of this ratio. A low value of this 
index represents an efficient staff. 

Returns/gross demand. The dollar value of the 
returned merchandise as a function of gross demand 
would be a reflection in a COE setting of the quality 
of work turned out by the employees. Low values 
indicate high quality work. Since there are many 
variables affecting returns, this is likely a crude 
estimate at best. 


Satisfaction Measures 


The satisfaction measures to be reported in this 
study are taken from a survey made by the com- 
pany during 1962. These surveys are made periodi- 
cally by the company and the employees have come 
to accept them as a matter of course. The question- 
naires are treated anonymously by the company 
and are intended purely as aids to better working 
relationships between management and the workers. 

The specific job satisfaction questionnaire employed 
was developed by the company for its own needs. 
The questions in this inventory are directed toward 
nine different content areas. These content areas 
(supervision, kind of work, amount of work, co- 
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workers, working conditions, pay, career and security, 
company identification, and organizational effective- 
ness) are measured in two different ways. One set 
of questions is descriptively worded and asks the 
worker to describe different aspects of his work. A 
second set of questions over the content areas is 
evaluatively worded and asks for evaluations of these 
same aspects (see Yuzuk, 1961). The questionnaire 
generally displays adequate convergent validity but 
the discriminant validity tends to be less impressive. 
Nonetheless, it was felt that this instrument would 
yield a reasonable estimate of the overall job satis- 
faction of these work groups. The discrimination 
between the areas of satisfaction might not be as 
clear cut. 

These variables were intercorrelated using Pearson 
product-moment correlations in all cases. 


RESULTS 


A 34 X 34 intercorrelation matrix was the 
result of this analysis. The correlations in this 
matrix represent the relationships between the 
average values of variables associated with 
each of the offices and not the individual em- 
ployees of the offices.2 A general inspection 
of this matrix revealed several relationships 
of general interest. The variables chosen to 
index the extent of slums, prosperity, and 
productive farming of the community are 
more closely related than one would like 
under ideal conditions. The present sample 
of COEs does not represent a random sample 
of communities. None of the large metropol- 
itan centers of the United States is repre- 
sented in these data. This deliberate over- 
sampling of the rural areas and small towns 
is because the COEs are located in these 
areas. This amount of bias may well have 
affected these relationships. 

A second point of general interest is the 
submatrix of intercorrelations between the 
measures of job satisfaction. For the most 
part these different measures indicate ade- 
quate convergent validity. The average 
heteromethod-monotrait (Campbell & Fiske, 
1959) correlation is 58 (p< .001). This 
would indicate that the two different 
“methods” of assessing the workers’ reaction 
to their jobs are tapping somewhat the same 
variables. However, the heterotrait-mono- 
method and heterotrait-heteromethod correla- 


3 Copies of the complete 34 X 34 intercorrelation 
matrix may be obtained by contacting Charles L. 
Hulin, Department of Psychology, University of 
Illinois, Urbana, Illinois, 
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TABLE 1 
AVERAGE CORRELATIONS BETWEEN COMMUNITY CHARACTERISTICS AND SATISFACTION VARIABLES 








Slums 

1a 2b 
Supervision 01 —08 
Kind of work 14* —11 
Amount of work 12* —i1 
Co-Workers 08 —06 
Working conditions 20** —08 
Pay 26** —15* 
Career and security 10 —06 
Company identification 19** —06 
Organizational effectiveness Slike 1 One 


—05 


Community characteristics 





Productive Unem- 
Prosperity farming ployment 
3 4 5 6 cd 8 

—03 01 —02 —06 —04 05 
—18** —15*  —20** —14* —19** 02 
—16** —12*  —16** —12** —0*% 00 

—01 —07 —13* —04 — 04 
—24** —21** —26** —20** —26** 06 
—44ee —_3g** —45** — 24% —45** 10 
—16** —12* —16** —14* — 1'5* 02 
—34** —30** —30** —26** —31** 11 
—36** —28** —34** — 23% —38** 03 





8 Variables in following order: (a) percentage of nonwhite, (b) percentage of owner occupied housing, (c) median income, 
(d) percentage earning over $10,000, (e) percent sound housing, (f) per capita retail sales, (g) median rural income per family, 


(h) percentage of unemployed workers. 
Reverse scoring to obtain same direction as Variable 1. 
p< .05. 
> <.01. 


tions are larger than the Campbell-Fiske 
model indicates they should be. Several 
instances of low discriminant validity are 
evident. 

A third point of general interest is the 
generally near-zero correlations between the 
group-satisfaction measures and the group- 
productivity measure (7 = .04). This would 
seem to indicate that, contrary to Katzell’s 
et al. prediction, the retailing industry does 
not necessarily provide the type of situation 
conducive to positive correlations between job 
satisfaction and productivity which they ob- 
tained in a study of a group of warehouses. 

A final point of general interest is indicated 
in the near-zero (7 = —.03) correlations be- 
tween group size and satisfaction measures. 
This is also contrary to expectations in this 
area. Previously reported findings would sug- 
gest a negative correlation between group size 
and satisfaction. 

The correlations relevant to the two pre- 
dictions made above are presented in Table 1. 
In this table the content area of the satis- 
faction questions is indicated on the left and 
the community variables are indicated at the 
top. If we look first at the correlations be- 
tween the satisfaction variables and the vari- 
ables used to index slums (percentage of non- 
white and percentage of owner occupied 
housing), prosperity (median income, per- 
centage earning over $10,000, percentage 


of sound housing, per capita retail sales), and 
productive farming (median rural income per 
family) we find Prediction 1 strikingly con- 
firmed. Sixty-two of the 63 correlations are 
in the predicted direction and 44 of these 63 
correlations are significant at the .05 level or 
better. Thus, it appears that the less attrac- 
tive the community, in terms of slums, pros- 
perity, and productive farming, the more 
satisfied are the workers with their jobs. It is 
of some interest to note that of the 19 cor- 
relations that failed to reach significance at 
the .05 level, 13 were contributed by two 
satisfaction areas—supervision and co-work- 
ers. It appears that while satisfaction with 
most aspects of a worker’s job is affected by 
the community, reactions to other people on 
the job are relatively unaffected. 

The data testing the validity of Predic- 
tion 2 are more equivocal. Pay satisfaction 
has the highest correlation with the slum, 
prosperity, and productive farming variables 
only four times, Three times it has the second 
highest correlation. The other variable dis- 
playing strong relationships with the eco- 
nomic community variables—organizational 
effectiveness—has the strongest correlation 
twice, is second three times, and third twice. 
While, on the average, pay satisfaction was 
the satisfaction variable most strongly affected 
by community variables, Prediction 2 must 
be regarded as only partially confirmed. 
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(Average correlation between pay satisfaction 
and the economic variables listed in Table 1 
is .31. The average for the other aspects of 
satisfaction is. 14.) 

Further examination of Table 1 reveals 
that not only are most of the correlations sig- 
nificant and in the expected direction, but 
that the rank order of the correlations be- 
tween the nine satisfaction variables and the 
seven community items within the three eco- 
nomic variates appears to be consistent. A test 
of concordance on these results indicates a 
significant degree of consistency of the rela- 
itve magnitudes of the correlations (W = .81, 
p <.01). 

The fourth variable which was used to 
assess the job opportunities of the community 
did not yield this degree of confirmation of 
the predictions, Eight of the nine correlations 
between satisfaction and “percentage of un- 
employed workers” were positive as predicted 
but none reached the .05 level of significance. 
Thus, this aspect of the community does not 
appear to be related to job satisfaction to 
a significant degree. 

The remainder of the community variables 
for which no predictions had been made 
indicated no consistent and __ significant 
relationships. 


DISCUSSION 


The results of this study indicate generally 
significant and often sizable correlations be- 
tween job satisfaction and economic commun- 
ity characteristics. Further, pay satisfaction 
appears to be somewhat more affected by 
these characteristics than are the other areas 
of job satisfaction. These predicted results 
appear to be more evidence indicating the 
validity of the conception of job satisfaction 
which stresses that job satisfaction is a 
product of the discrepancies between expecta- 
tions and experience, of the actual experience 
on the job, of the frame of reference of the 
worker, and of the alternatives open to the 
worker. 4 

The major question is why these results 
should occur. Assuming the conception of job 
satisfaction presented above is correct, the 
explanation for these results can be derived 
from it with a minimum of difficulty. Let us 
consider the general economic condition of the 
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community as serving two major purposes. 
First, would be the function of establishing a 
frame of reference against which the workers 
evaluate their present position. If this is in- 
deed the case, then in a poor community a 
low-level white-collar worker ($1.25/hour— 
$1.70/hour) is relatively better off than he 
would be if he lived in a wealthy community. 
In a poor community he may well be one of 
the “middle class.’”” The same worker in a 
wealthy community would be considered, and 
probably would consider himself, somewhat 
less well off. The relationship of this explana- 
tion to Helson’s adaptation level theory (1948) 
is obvious. It should be noted that these 
results were obtained on a sample of relatively 
low level clerical workers but if our concep- 
tion of job satisfaction is correct these same 
relationships should be obtained regardless 
of the occupational level of the worker. Five 
dollars or 1 dollar per hour should make you 
a richer person in Yazoo City, Mississippi, 
than the same wage does in Shaker Heights, 
Ohio. Thus even though these results were 
obtained on relatively low-level workers, they 
should hold true for workers from all strata. 
If the function of these community character- 
istics is indeed to provide the workers with a 
frame of reference, then we should expect 
these results. 

An alternative explanation for the function 
of the community variables would be that 
these variables are serving to index the 
alternatives open to the worker (see Smith, 
1963; Kendall, 1963). In communities with 
a great many slums and which have a low 
level of prosperity, the worker may already 
have the best of the available alternatives. 
Any change he makes will likely be for the 
worse. In a wealthy community there may be 
several alternatives open to him which are 
more attractive. We would expect that the 
worker in the poor community who sees no 
alternatives which are more attractive should 
be more satisfied with his job than his 
counterpart in a wealthy community who is 
surrounded by more attractive alternatives. 

If this latter explanation is correct, we 
would expect that satisfaction measures would 
also be related to the amount of unemploy- 
ment in the area since this variable is a very 
direct reflection of the alternatives open to 
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“workers in general.’”’ The present data do not 
indicate the presence of such a relationship. 
There are, however, many problems connected 
with the rejection of this latter hypothesis on 
the basis of these data. Our measure of the 
percentage of unemployed workers in the 
area was the least satisfactory of the com- 
munity characteristics studied. We were 
forced to rely on the 1960 census figures re- 
garding this measure and the amount of un- 
employment may well have changed consider- 
ably during the 2-year span until the job 
satisfaction measures were obtained. Also, 
employed workers may not be aware of the 
amount of unemployment in their area. We 
would have to regard both explanations of 
the findings as tenable until more evidence 
is available regarding the relationship between 
satisfaction and unemployment. 

Equally intriguing are the satisfaction vari- 
ables which appear to be unaffected by com- 
munity characteristics. Nineteen of the 63 
correlations between satisfaction variables and 
community characteristics were nonsignificant. 
Thirteen of these 19 nonsignificant correla- 
tions were contributed by satisfaction with 
supervision and satisfaction with co-workers. 
These two variables are the only two satis- 
faction variables that are directly related to 
satisfaction with interpersonal relations. Pre- 
viously Hulin (1963a), Hulin and Smith 
(1965), and Kendall (1963) determined that 
these two areas of satisfaction did not behave 
in the same manner as the satisfaction vari- 
ables related to other aspects of the job. 
Hulin and Smith found that while work, pay, 
and promotions satisfactions were related to 
a worker’s age, tenure, salary, and job level; 
supervision and co-worker satisfaction were 
not affected by this set of independent vari- 
ables. Kendall found that these two areas of 
satisfaction were less frequently associated 
with the community characteristics than one 
would expect. The reasons for this apparent 
lack of correspondence between interpersonal 
satisfactions and satisfactions with other 
aspects of the job may simply be because 
there is very likely more agreement on what 
constitutes a good promotion policy or what 
constitutes an acceptable rate of pay than 
there is on what constitutes a good supervisor 
or work group. To be sure there will be a 
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certain amount of agreement of acceptable 
standards but these agreements will probably 
be outnumbered by the disagreements. When 
we attempt to study these aspects of job 
satisfaction we will be forced to include vari- 
ables which will predict a worker’s reaction 
to another person. We may be forced to widen 
the class of independent variables to include 
personality, personal background, or other 
similar types of measures. Kendall (1963) 
has demonstrated that the addition of such 
measures produces more general and replica- 
ble results in the investigation of satisfaction 
and behavior in industry. This will undoubt- 
edly lead us to a program of personality 
research applied to job satisfaction. 

The results of this study would seem to 
indicate that a conceptualization of job satis- 
faction which does not include recognition of 
the part played by frames of reference or 
alternatives available to the worker is going 
to be inadequate. At the same time, investi- 
gations of job satisfaction should include the 
community and plant or office characteristics 
if these are allowed to vary. These results 
also raise serious questions concerning the 
validity of the suggestion by Herzberg (Herz- 
berg, Mausner, and Snyderman, 1959) that 
the determinants of how a man reacts to his 
job are to be found in the intrinsic character- 
istics of the job, and not in the environmental 
characteristics surrounding the job. It is no 
longer enough to consider community and 
situational variables as moderator variables 
or nuisance variables. The direct effect of 
these variables on satisfaction must be con- 
sidered. These considerations may complicate 
the life of the researcher in this area but 
greater understanding would seem to be the 
inevitable result. 
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INTRINSIC AND EXTRINSIC JOB MOTIVATIONS AMONG 
DIFFERENT SEGMENTS OF THE WORKING POPULATION 


RICHARD CENTERS ann DAPHNE E. BUGENTAL 


University of California, Los Angeles 


A selected cross-section of the working population (N = 692) was interviewed 
with respect to their job motivations. The extent to which extrinsic or intrinsic 
job components were valued was found to be related to occupational level. 
At higher occupational levels, intrinsic job components (opportunity for self- 
expression, interest-value of work, etc.) were more valued. At lower occupa- 
tional levels, extrinsic job components (pay, security, etc.) were more valued. 
No sex differences were found in the value placed on intrinsic or extrinsic 
factors in general. However, women placed a higher value on “good co-workers” 
than did men, while men placed a relatively higher value on the opportunity to 


use their talent or skill. 


It has often been proposed that job values 
or motivations can be classified as “intrinsic” 
or “extrinsic’”—that there are some motives 
which are related to the work activity itself 
and others which stem from external or con- 
textual factors. This division is not new. How- 
ever, there has been a resurgence of interest 
in this proposed dichotomy (e.g., Ewen, 1964; 
Friedlander, 1963; Harrison, 1960; Schwartz, 
Jenusaitis, & Stark, 1963) since Herzberg, 
Mausner, and Snyderman (1959) concluded 
that job satisfaction results primarily from in- 
trinsic job factors while job dissatisfaction 
results primarily from extrinsic job factors. 

Herzberg et al.’s study was concerned with 
the positive or negative motivational aspects 
of different job factors within a restricted oc- 
cupational sample. It was our intention in 
this paper to extend the available information 
on the role of intrinsic and extrinsic job fac- 
tors as motivators by studying the motiva- 
tional strength of intrinsic and extrinsic job 
factors in a sample of the entire working pop- 
ulation (men and women at all occupational 
levels). The use of a broad occupational sam- 
ple was necessary because job motivations 
have been shown to vary at different occupa- 
tional levels (Centers, 1948; Gurin, Veroff, & 
Feld, 1960; Jurgenson, 1947). Additionally, 


the strength of job motivations must be con- 
sidered. A man may derive a great deal of 
satisfaction from the use of his skill on his 
job, for example, but his decision to stay on 
that job or take a different one may be influ- 
enced by other motives which have greater 
strength. He may decide to take a different 
job in order to obtain greater financial secur- 
ity (which acts more as a “dissatisfier” when 
absent than as a “satisfier” when present). 
It was our prediction in this study that in- 
dividuals at higher occupational levels would 
place a greater value on intrinsic job factors 
than would individuals at lower occupational 
levels. Individuals at lower occupational levels 
were expected to place a greater value on 
extrinsic job factors. These predictions essen- 
tially represent extensions and clarifications 
of earlier findings. In his survey of a cross- 
section of the male working population, Cen- 
ters (1948) found that security was the most 
important job motivation in lower-level occu- 
pations, whereas self-expression was of greater 
importance in higher-level occupations. The 
relatively greater importance of rewards as- 
sociated with the work itself in higher occu- 
pations has been supported by other investi- 
gators (Gurin et al., 1960; Jurgenson, 1947). 
In the present study we included the fol- 
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lowing as intrinsic sources of job satisfaction: 
self-expression (a chance to use one’s skill or 
talent), interest-value of the work, and feeling 
of satisfaction derived from the work itself. 
Under extrinsic sources of job satisfaction we 
included: pay, security, and satisfying co- 
workers. In addition to determining the rela- 
tionship between occupation and _ intrinsic 
versus extrinsic job motivations, we also 
wished to measure any sex or occupational 
differences in the importance of specific job 
factors. Many studies of occupational motiva- 
tion have been limited to male respondents, 
for example, Centers (1948), Gurin et al. 
(1960), or have been restricted with respect 
to size or occupational diversity. Jurgenson 
(1947) presented the best comparison of job 
motivations of men and women, but his 
groups had systematic occupational as well as 
sex differences; hence his results were some- 
what indeterminate for sex differences per se. 
Therefore, it was possible in the present study 
to fill a gap in the information available on 
job motivations by comparing motivations of 
men and women within a number of occupa- 
tional groupings, 


MErTEOD 


Our data were secured by person-to-person inter- 
views conducted with 692 employed adults. The sam- 
ple interviewed constituted a selected cross section of 
a major urban area (greater Los Angeles). The sev- 
eral interviewers were senior majors and graduate 
students in psychology and were at the time enrolled 
in a course in survey research methods conducted by 
the senior author, Each was assigned an age and sex 
quota within specific subsampling areas stratified on 
the basis of socioeconomic status criteria, with the 
attempt being made in the overall sampling to rep- 
resent the same proportion of different socioeconomic 
levels as occur in the total working population of the 
Los Angeles area. The sampling, for our purposes, 
was limited to persons who were not self-employed.t 
We did not consider it essential that we achieve 
highly precise representation in terms of the total 
employed population, for our main interest and aim 
was comparing the responses of major groups rather 
than depicting the response of the general population 
of employed persons. Respondents were in every case 
interviewed in their own homes, with a standard 
interview schedule being followed. This latter was 
the result of trial and error pretesting by the authors. 


1The sample excluded self-employed individuals in 
order to allow questions about the respondent’s 
supervisor; this second aspect of the survey is pre- 
sented in a separate paper. 
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Job motivations were measured by a procedure 
adapted from Centers’ original study of job motiva- 
tion and occupational stratification (1948). In that 
study, the respondent was asked to name his first, 
second, and third choices from a list of 10 different 
kinds of jobs, for example, “a very interesting job,” 
“a job where you could be boss.” 2 In the present 
study, the central focus was on the intrinsic versus 
extrinsic nature of job motivations. Hence we se- 
lected items from the original list which could be 
categorized as representing intrinsic or extrinsic job 
components. One of the original items (‘a job where 
you could help other people”) was broken down 
into two probable components: (a) “good co-work- 
ers” (an extrinsic factor relating to social needs), and 
(b) “the work gives you a feeling of satisfaction” 
(a broad intrinsic category which could relate to the 
altruistic or achievement rewards provided by the 
work itself). Respondents were asked, “Which of 
these things is most important in keeping you on 
your present job?” They were given a card contain- 
ing the following items in the order here shown. 


1. The pay. (Extrinsic) 
2. Good co-workers. (Extrinsic) 
3. The work is interesting. (Intrinsic) 
4. The work allows you to use your skill 

or talent. (Intrinsic) 
5. You can be sure of always having 

the job. (Extrinsic) 
6. The work gives you a feeling of 

satisfaction. (Intrinsic) 


The respondents were asked to choose which of these 
variables was first, second, and third in importance 
to them. 

On the questionnaire used in Centers’ original 
study, financial job motives were measured by the 
choice “a very highly paid job.” At that time, this 
job component was selected much less often than 
would be expected by other studies of job motiva- 
tion. On the present questionnaire the wording was | 
altered in order to decrease the stress on a large 
amount of pay, and simply measure the importance 
of the amount of pay currently being received. 


RESULTS 


Job motivations were found to have the ex- 
pected relationship to occupational level. Table 
1 presents data on job motivations subdi- 
vided according to occupational level. It can 
be seen that all three intrinsic job components 


2The complete list was as follows: a job where 
you could be a leader; a very interesting job; a job 
where you could be looked upon very highly by your 
fellow men; a job where you could be boss; a job 
which you were absolutely sure of keeping; a job 
where you could express your feelings, ideas, talent, 
or skill; a very highly paid job; a job where you 
could make a name for yourself or become famous; 
a job where you could help other people; a job 
where you could work more or less on your own. 
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TABLE 1 


PERCENTAGE OF SUBJECTS CHOOSING A JoB FAcToR AS IMPORTANT: COMBINED I*IRST, SECOND, 
AND THIRD CHOICES 


Intrinsic job factors Extrinsic job factors 





Inter-_ Self- 
esting expres- Satis- Co- 
Occupation N work sion faction Pay Security workers 
Professional & managerial 217 68 64 68 59 16 25 
Clerical & sales 183 62 48 46 66 31 46 
Total white-collar 400 65 ah 58 62 23 35 
Skilled 98 61 51 46 70 as 40 
Semi-skilled & unskilled 135 50 35 39 74 49 52 
Total blue-collar 233 55 42 42 73 42 46 
Difference*: white-collar— 10* 15% 16* —11* —19* —11* 


blue-collar 





Note.—Percentages add to 300%. Each person made three choices. 
« The significance of the difference between percentages was tested by a CR, 


*p<.01. 


were more valued among white-collar groups 
than among blue-collar groups. Correspond- 
ingly, all three extrinsic job components were 
more valued in blue-collar groups than in 
white-collar groups. 

Occupational divisions in this study follow 
the occupational groupings used in the Dic- 
tionary of Occupational Titles (1949). Semi- 
skilled occupations were pooled with unskilled 
occupations because of the small number of 
unskilled respondents. All Ss (WV = 59) whose 
occupations did not fit in any of the cate- 
gories listed in Table 1 were excluded from 
any comparison of occupational groups but 
were retained for sex comparisons. 

A comparison of the job values of male and 
female workers is given in Table 2. It can 
readily be seen that there were no consistent 


sex differences in the overall value placed on 
intrinsic versus extrinsic job components. The 
two single components on which sex differ- 
ences occurred were those of ‘co-workers’ 
and “self-expression.”’ Half of the female re- 
spondents mentioned “good co-workers” as 
important to them, whereas only about a third 
of the male respondents mentioned it. Men 
were more prone to value the chance to use 
their skill or talent in a job than were women. 
Men and women did not differ significantly in 
overall occupational level.* However we can- 


8On a 7-point occupational rating scale, adapted 
from Warner, Meeker, and Eels (1949, p. 140), the 
men in our sample received a mean rating of 3.97 
(“1” is the highest occupational level), and the 
women received a mean rating of 4.19. This difference 
yielded a nonsignificant CR of 1.69. 


TABLE 2 


PERCENTAGE OF MEN AND WoMEN CHOOSING A JoB Factor AS IMPORTANT: COMBINED I’IRST, SECOND, 
AND THIRD CHOICES 





Intrinsic job factors 


Interesting Self- 
expression faction 





Extrinsic job factors 





Satis- 
Pay Security Co-workers 


Oe 


Sex N work 
Men 471 60 
Women 221 66 
Difference* —06 


Note.—Percentages add to 300%. 


51 52 68 32 36 
43 52 62 27 50 
08* 00 06 05 —14** 


Each person made three choices. 


® The significance of the difference between percentages was tested by a CR. 


*p = .05. 
> = 01. 
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not be sure that the observed differences are 
not a simple artifact of more subtle and quali- 
tative sex differences in occupational level, 
for our necessary employment of broad occu- 
pational categories obscures these. As noted 
later, however, our findings are in agreement 
with those of other researchers. 


DISCUSSION 


In general, our expectations with respect to 
job motivation within different segments of 
the working population were confirmed. 
White-collar workers consistently placed a 
greater value on intrinsic sources of job satis- 
faction than did blue-collar workers. Cor- 
respondingly, blue-collar workers consistently 
placed a greater value on extrinsic sources of 
job satisfaction. The job motivations of work- 
ers at higher occupational levels stem from 
the work itselfi—the skill required, the interest- 
value of the work, etc. A person in a white- 
collar job is more likely to select a job or stay 
on a particular job because of these intrinsic 
considerations rather than because of the pay 
he receives, the financial security, etc. At 
lower occupational levels, job motivations are 
centered in factors which are external to the 
work itself. In choosing a job or staying on a 
job, a person in a lower-level occupation is 
more influenced by financial and social con- 
siderations, 

Clerical or sales workers and skilled work- 
ers were found to be very similar in job moti- 
vations. The biggest shift in values occurred 
between “professional-managerial” and “cleri- 
cal-sales” and between “skilled” and “semi- 
skilled and unskilled.” This is in line with the 
finding of other studies (Centers, 1948; Gurin 
et al., 1960; Super, 1939) that intrinsic job 
satisfaction decreases from higher to lower- 
level white-collar jobs, stays about the same 
for low-level white-collar jobs and high-level 
blue-collar jobs, and then decreases again in 
lower-level blue-collar jobs. 

As in Centers’ original study, security 
proved to be the job component which varied 
the most in importance between occupational 
levels—moving from a position of very low 
importance in professional-managerial occu- 
pations to a position of relatively high impor- 
tance among semiskilled and unskilled work- 
ers. “Pay” was the most important job factor 
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at all occupational levels except “professional- 
managerial.” This contrasts sharply with Cen- 
ters’? earlier study in which financial profit 
was shown to be a very weak job motivation. 
This discrepancy is probably due to the differ- 
ence between asking respondents the value of 
a great deal of money (earlier study) and ask- 
ing the value of the amount of pay they were 
currently earning (present study). The value 
placed on “interesting work” and “self-expres- 
sion” followed the same pattern with respect 
to occupational level as was shown in Centers’ 
earlier study, that is, there was a higher value 
placed on these job factors at higher occupa- 
tional levels. 

Men and women were not found to differ 
in the extent to which they valued intrinsic 
or extrinsic job satisfactions in general. How- 
ever, sex differences were observed in the 
value placed on self-expression (opportunity 
to use skill or talent) and good co-workers. 
Men placed a slightly higher value than did 
women on self-expression in their work. 
Kuhlen (1963) found that occupation is psy- 
chologically more central to men than women. 
Perhaps the greater value our male respond- 
ents placed on the opportunity to use their 
skill reflects the greater pride they take in 
their work. There was a marked difference 
between men and women in the value placed 
on good co-workers. Women were much more 
likely than men to value good co-workers. 
This provides confirmation for the findings 
of Hardin, Reif, and Heneman (1951) and 
Jurgenson (1947), with less representative 
groups, that women place more emphasis on 
social factors on the job than do men. 

This study has demonstrated that there are 
differences in job motivations between occu- 
pational levels and between men and women. 
The magnitude of these differences should 
serve as a warning against drawing generaliza- 
tions about job motivations on the basis of a 
sample which is too limited or too selective. 
Numerous practical implications can also be 
drawn from our results, for example, the re- 
ward-value of different types of job incentives 
can be expected to differ for different parts of 
the working population; different types of 
supervision should be effective for men and 
women, or for white-collar as opposed to blue- 
collar workers. 
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Although our findings indicate occupational 
differences in the job motivations which are 
actually operating, they do not necessarily 
show that there is a basic or unalterable dif- 
ference in values between occupational levels. 
It may be that the difference is merely cir- 
cumstantial. Interpreting our results in terms 
of Maslow’s (1943) need-hierarchy, it could 
be said that individuals in lower-level occupa- 
tions are more likely to be motivated by lower- 
order needs (pay, security, etc.) because these 
are not sufficiently gratified to allow higher- 
order needs (the self-fulfillment possible in 
the job itself) to become prepotent. 
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RELATIVE CONTRIBUTIONS OF MOTIVATOR AND 
HYGIENE FACTORS TO OVERALL 
JOB SATISFACTION * 


GERALD HALPERN 


Educational Testing Service, Princeton, New Jersey 


Ratings of 4 motivator job aspects, 4 hygiene job aspects, and overall job 
satisfaction were obtained from 93 male Ss who were equally satisfied with 
both the motivator and the hygiene aspects of their jobs. 2 of the job aspects 
(work itself and opportunity for achievement), both motivators, were sufficient 
to account for the variance in overall satisfaction. 


The motivator-hygiene theory states that 
job satisfaction and job dissatisfaction are 
reactions to different kinds of job aspects and 
denies that a given job aspect can be instru- 
mental, to any appreciable degree, in provid- 
ing both job satisfaction and job dissatisfac- 
tion. The theory is summarized by Herzberg, 
Mausner, and Snyderman (1959) as follows: 


the three factors of work itself, responsibility, and 
advancement stand out strongly as the major fac- 
tors involved in producing high job attitudes. Their 
role in producing poor job attitudes is by contrast 
extremely small. Contrariwise, company policy and 
administration, supervision ..., and working con- 
ditions represent the major job dissatisfiers with 
little potency to affect job attitudes in a positive 
direction. . . . The job satisfiers deal with the factors 
involved in doing the job, whereas the job dissatis- 
fiers deal with the factors that define the job con- 
text [pp. 81-82]. 


The job aspects that form the content of 
the job are labeled motivators to draw atten- 
tion to their ability to satisfy the individual’s 
need for self-actualization in his work. Those 
job aspects that relate to the job context are 
labeled hygiene to symbolize the preventive 
role that they play in regard to job dissatis- 
faction. 

There are at least two aspects of this theory 
which tend to be misunderstood. To claim 
that the motivator factors, when present, con- 
tribute to satisfaction but not to dissatisfac- 
tion does not deny the reality of hygiene 
needs. The motivator-hygiene theory of job 
satisfaction clearly recognizes that “both kinds 


1A draft of this paper was presented at the meet- 
ings of the American Psychological Association, Chi- 
cago, Illinois, September 1965. 
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of factors meet the needs of the employee” 
(Herzberg et al., p. 114), but stresses that 
only the presence of motivators can lead to 
satisfaction. 

It should also be noted that the motivator- 
hygiene theory does not predict level of satis- 
faction with any single factor whether it be 
hygiene or motivator. Although it is only the 
motivators that lead to overall job satisfac- 
tion, there is no assertion that employees 
cannot be equally satisfied with all aspects of 
their jobs. The theory simply says that these 
two factors have very different consequences 
for overall job satisfaction. 


PROCEDURE 


The basic hypothesis of the motivator-hygiene 
theory was tested as part of a larger study which 
examined several of the assumptions inherent in 
counseling. As part of that study, ratings of satis- 
faction with both the motivator and hygiene aspects 
of work, as well as with overall satisfaction, were 
gathered. 

The sample was obtained from the files of a uni- 
versity counseling service. These files, entered for 
the years 1948-52, were searched for folders con- 
taining ACE Psychological Examination scores and 
Strong Vocational Interest Blank profiles of males 
aged 17-24 at time of counseling. A sample of 101 
men was located through the city telephone direc- 
tory, and each was contacted by telephone. The 
study was briefly described to them as an investi- 
gation of the work patterns of former counselees. In 
every instance, an initial offer to cooperate was 
obtained and, in many instances, the subjects (Ss) 
voiced personal interest in the research. Of the 101 
Ss, 93 returned completed questionnaires. The av- 
erage age of Ss at the time of the study was 32.5 
years (o =2.7), and they had worked an average of 
9.5 years (= 4.1), had held an average of 3.9 jobs 
(o =2.0), and had worked an average of 3 years 
(¢ = 1.7) on each job. 

Part of the questionnaire that Ss completed asked 
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TABLE 1 
INTERCORRELATION MATRIX AND MEANS AND STANDARD DEVIATIONS OF MOTIVATOR AND HYGIENE 
FACTOR RATINGS AND THEIR CORRELATIONS WITH OVERALL SATISFACTION 
Correlation Ratings 
with overall 
Factor Intercorrelation matrix® satisfaction Z o 
Motivator 1 2 3 4 5 6 iL 8 
1. Achievement CORPES Ole LeeCOMNt4o eon 29 16 5 Sa eele/ 
2. Work itself 60 ANZ 5 50863 29" 28 16 (sy le 
3. Responsibility 56 §640 je 22 D7) 238 M17 sai Oa” #1ed 
4, Advancement Site oee el. 7) Sou 4OneO2 asa 46 i eel) 
Hygiene 
5. Company policy SOM GOMEZ 2 Et DS 47 03 41 46 Sales 
6. Supervision 43 33 27 46 47 SiioS 47 Green 
7. Interpersonal relationships Gg eee 5 O22 (03 GS 09 35 6:0 eles, 
8. Working conditions DD Dei i BA I ICS aS 29 5.0 1.6 





Note.—N = 93, 
* Decimal points omitted. , 


them to rate various aspects of their best-liked 2 
job using 7-point graphic rating scales. Scale values 
went from one, very dissatisfied, through four, neu- 
tral, to seven, very satisfied. 

The four motivator aspects rated were: 

1. Opportunity for achievement—opportunities to 
achieve something you consider worthwhile, oppor- 
tunities for successful accomplishments. 

2. Work itself—the actual work performed. 

3. Task responsibility—the amount of personal re- 
sponsibility you were given for your own work. 

4. Advancement—the opportunities available for 
getting ahead, for being promoted. 

The four hygiene aspects were: 

5. Company policies—the procedures used by the 
company in conducting its business, as well as the 
company’s attitude toward employees. 

6. Supervision—the type of interpersonal relation- 
ships between yourself and your immediate super- 
visor. 

7. Interpersonal relationships—the social atmo- 
sphere of your work group, the kinds of feelings that 
existed between yourself and your fellow-workers. 

8. Working conditions—such things~as the amount 
of work space available, lighting, temperature, 
equipment, and so forth. 

These eight job aspects and their classification as 
hygiene or motivator were taken from Herzberg et 
al. (1959). In addition, each S rated his overall 


2 Restriction of the ratings to the present job 
would have confounded responses with many other 
factors such as recent but temporally limited events 
and, perhaps most important, cognitive dissonance. 
Ratings were obtained only for the best-liked job 
since the larger study focused upon the correlates of 
satisfied people. Prepotent demands upon subject 
time precluded the gathering of data on the least- 
liked job as well. 





satisfaction with the job. Overall satisfaction was 
defined to S as his feelings about the job as a whole, 
taking into account both the favorable and un- 
favorable aspects of the total job. 


RESULTS AND DISCUSSION 


Table 1 presents the intercorrelation matrix 
and the means and standard deviations of 
each of the job aspects rated and their cor- 
relations with overall satisfaction. The aver- 
age level of satisfaction across the four moti- 
vator job aspects was 5.9, and the average 
level of satisfaction across the four hygiene 
job aspects was 5.8 (ms). The average corre- 
lation between the four motivator job aspects 
and overall satisfaction was .65, and the 
average correlation between the four hygiene 
job aspects and overall satisfaction was .40 
(p < .01 that these are the same). A Wherry- 
Doolittle maximum shrunken multiple corre- 
lation (Wherry, 1940) was computed for the 
eight job aspects against the criterion of over- 
all satisfaction. Two of the job aspects (work 
itself and opportunity for achievement), 
both motivators, accounted for 74% of the 
variance in ratings of overall satisfaction (R 
= .86). Both had equal beta weights. 

Two findings are readily apparent. First, 
Ss were equally well satisfied with both the 
motivator and the hygiene aspects of their 
jobs. There was no difference in their ratings 
of satisfaction with either the motivator or 
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the hygiene factors. Second, as predicted by 
the motivator-hygiene theory, the motivator 
factors contributed significantly more to over- 
all satisfaction than did the hygiene factors. 
The average correlation between motivator 
job aspects and overall satisfaction was sig- 
nificantly higher than the average correlation 
between the hygiene job aspects and overall 
satisfaction. When the intercorrelations be- 
tween the job aspects were taken into ac- 
count, only two of the motivator aspects were 
sufficient to account for the variance in over- 
all satisfaction. 

These findings support the basic thesis of 
the motivator-hygiene theory of job satisfac- 


GERALD HALPERN 


tion. In spite of the fact that Ss were equally 
satisfied with both aspects of their jobs, it is 
the motivators—the factors related to per- 
sonal success in work and individual growth— 
that are primarily related to job satisfaction. 
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INFLUENCE OF TIME SHARING AND CONTROL 
LOADING ON TRANSFER OF TRAINING* 


GEORGE E. BRIGGS anp EARL L. WIENER 2 


Ohio State University 


The hypothesis was confirmed that in a tracking task low fidelity of control- 
device loading during training would result in near-100% transfer when time- 
sharing requirements are ata relatively low level, but it would result in 
significantly less than 100% transfer when such requirements are at a relatively 


high level. 


In a study by Briggs, Fitts, and Bahrick 
(1957), four groups of subjects (Ss) tracked 
during training under various levels of force- 
and amplitude-feedback*cues from a two-di- 
mensional control device. The force cues were 
manipulated by using two different spring 
stiffnesses on the control stick, while ampli- 
tude cues were determined by two different 
sensitivities of the tracking task dynamics. 
One of the groups trained with “optimal” con- 
trol loading and the three experimental groups 
transferred to this condition. All three ex- 
perimental groups attained near-100% trans- 
fer. 

The transfer results might be interpreted 
to mean that training task fidelity, in terms of 
control-device characteristics (control load- 
ing), is not a necessary condition for ade- 
quate transfer performance in tracking-type 
tasks. Anecdotal evidence indicates that this 
is not a valid conclusion: the first few hours 
of driving an auto with “power steering” re- 
quire very close attention to the immediate 
steering task as one has to adjust to a very 
different set of proprioceptive cues available 
in the task and cannot safely time share his 
attention between steering and other activi- 
ties. This anecdotal evidence suggests the 
hypothesis that S depends upon proprio- 
ceptive cues from a primary control task to 
permit time sharing between that task and 
secondary activities. 


1This research was supported by the United 
States Naval Training Device Center, Port Wash- 
ington, New York, under Contract N61339-508. 
Reproduction of this publication in whole or in part 
is permitted for any purpose of the United States 
Government. 

2Now at the University of Miami. 


This hypothesis was tested in a transfer-of- 
training paradigm with two experimental and 
two control groups defined as in Table 1. Fm 
and Fo designate “minimal” and “optimal” 
spring stiffness, respectively, on the control 
device of the primary tracking task. These 
were selected from the Briggs et al. (1957) 
study. Lo and Hi represent a two- and a 
three-dimensional tracking complex, the lat- 
ter generating a higher time-sharing require- 
ment than the former. The predictions were 
as follows: Group 1 should achieve near- 
100% transfer while Group 3 should attain 
significantly less than 100% transfer. 


MeETHOD 


Apparatus. The primary tracking task was the 
same as that utilized by Briggs et al. (1957): a 
two-dimensional compensatory display, a  spring- 
centered, two-dimensional control device, and an 
analog computer which inserted aircraft dynamics 
between the control device and the display. The in- 
put signal to both tracking dimensions was a com- 
plex sinusoid (4+8+12 cpm) with amplitudes 
inversely proportional to the constituent frequencies. 
Thus, S was required to generate a complex but 
coordinated and predictable pattern of control-device 
deflections. During training Groups 1 and 3 uti- 
lized the control device with a spring stiffness of 


TABLE 1 


TRAINING AND TRANSFER COMBINATIONS OF 
TimME-SHARING LEVEL AND 
CONTROL LOADING 








Group Training Transfer 
1 (Experimental) Fm-Lo Fo-Lo 
2 (Control) Fo-Lo Fo-Lo 
3 (Experimental) Fm-Hi Fo-Hi 
4 (Control) Fo-Hi Fo-Hi 
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only 0.25 pound per degree of deflection, while 
Groups 2 and 4 experienced a spring stiffness of 1.0 
pound per degree throughout; during transfer Groups 
1 and 3 experienced the latter, more optimal con- 
trol loading. 

Groups 3 and 4 both worked at a higher level of 
time sharing than did Groups 1 and 2. The latter 
two groups experienced only the above two-dimen- 
sional tracking task, while Groups 3 and 4 in addi- 
tion had to control a one-dimensional tracking task 
(SETA). This secondary task has been described by 
Gain and Fitts (1959): it employs a compensatory 
display, a simple rotary knob control device, and 
positional dynamics between control and display. 
The input was a single sinusoid of 0.6 cpm. 

Subjects and procedure. Forty-eight undergradu- 
ates served in this study. Each had served previously 
in an experiment involving SETA, and thus all were 
familiar with the general requirements of a tracking 
task, but none had experienced the more complex 
task used here. There were 12 Ss per group and 
assignment was made on a chance basis with the 
restriction of equal-size groups. 

There were three sessions in the study, each sepa- 
rated by 24 hours. During the first session S 
received instructions and then experienced two 
blocks of four 40-second tracking trials each. During 
the second session there were four blocks of trials, 
and two blocks were administered on the third 
(transfer) session. There were 20-second rest periods 
between trials within a block and a 1-minute rest 
interval occurred between blocks within a session. 

Due to the initial transients in tracking behavior, 
performance was scored over the final 30 seconds 
of each trial. Integrated error squared in each of 
the two dimensions of the primary tracking task 
served as the basis for performance measurement: 
the two mean square error scores were combined to 
yield a score analogous to the standard error metric. 
Since all error scores existed as voltages in the analog 
computer, they were transformed to units of dis- 
tance (inches on the tracking display scale). Thus, 
the performance metric may be regarded as being 
analogous to the standard error of S’s amplitude 


Training Condition 


Standard Error (in inches) 





Four—Trial Blocks 


Fic. 1. Training (Blocks 1-6) and transfer (Blocks 
7-8) performance levels for all groups. 
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TABLE 2 


AVERAGE PERFORMANCE LEVELS FOR ALL GROUPS 
Over TRAINING BLocks 5 AND 6 








Control loading 





Row 
Time sharing Fo Fm Average 
Hi Al 59 .50 
Lo 37 43 40 
Column average 39 call 





distributions of tracking error expressed in units of 
linear extent on the tracking display. 

The primary tracking display was located directly 
in front of S at a distance of approximately 30 
inches. The two-dimensional control device was lo- 
cated in front of S also and was manipulated by 
his right hand. The secondary tracking display and 
control device were located 30 inches to the left of 
the primary display; thus, Groups 3 and 4 were re- 
quired to turn away occasionally from the primary, 
two-dimensional display in order to track the sec- 
ondary task, that is, it was impossible to use periph- 
eral vision for the secondary tracking task. 

All Ss were instructed to minimize displayed error 
on the primary task. Further, Groups 3 and 4 were 
told to attempt this for both tracking tasks as both 
were held to be “equally important.” 


RESULTS 


Figure 1 provides a summary of the track- 
ing proficiency levels for all groups during 
both training (Blocks 1-6) and_ transfer 
(Blocks 7-8). Each point is an average over 
12 Ss and four 30-second scoring periods, a 
total behavioral sample of 24 minutes. 

Training. Since all four groups appeared to 
be approaching asymptotic performance levels 
by Blocks 5 and 6 (see Figure 1), an analysis 
of variance was performed on data averaged 
over these last two blocks of training. The 
results indicated statistical significance for 
both independent variables: time-sharing 
level, F (1, 44) = 4.32, p < .05, and control 
loading, F (1, 44) = 6.48, p < .05. Table 2 
provides the averages which may be used to 
interpret these results. There it is apparent 
that, as one would expect, superior perform- 
ance (lower standard error scores) was ob- 
tained under a low level of time sharing (0.40 
inch versus 0.50 inch) and under “optimum” 
control loading (0.39 versus 0.51 inch). 

Transfer. Under the experimental hypothe- 
sis, stated above, it was predicted that Group 
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1 (Fm-Lo) would obtain near-100% transfer, 
while Group 3 (Fm-Hi) would exhibit sig- 
nificantly less than 100% transfer. In order 
to test this prediction, a percentage transfer 
index was calculated for each experimental 
group S using the formula [Cy — Er)/(C 
— Cy] X 100 where Ep is the average per- 
formance level of an individual S in Groups 
1 and 3 on Blocks 7 and 8, Cr represents the 
average performance of all Ss in the appro- 
priate control group on Blocks 7 and 8, and 
C, is the average performance of all Ss in the 
appropriate control group on Blocks 1 and 2. 
The average transfer index for Group 1 was 
86% while that for Group 3 was only 70%. 
The 86% transfer index for Group 1 agrees 
quite well with the indices noted earlier by 
Briggs et al. (1957) where values of 83%, 
88%, and 90% transfer were obtained for the 
experimental groups, all of which worked un- 
der a “low” time-sharing condition as did 
Group 1 of the present study. 

The 12 transfer indices for Group 1 then 
were included in a ¢ test to determine the pos- 
sible significance between 100% and _ the 
86% transfer actually found for this group. 
A comparable test was performed with the 12 
indices from Group 3. As predicted, the 70% 
average transfer for Group 3 did differ from 
100% transfer at p < .05, and the 86% av- 
erage transfer for Group 1 did not so differ 
at the same level of significance. It may be 
concluded, therefore, that the data support 
the hypothesis under test. 


DISCUSSION 


These results indicate that high fidelity of 
“control feel” during training is important 
when the transfer task requires a relatively 
high level of time sharing but that with a 
relatively low level of such a requirement, it 
is not necessary to employ high fidelity in 
control loading during training. This, then, 
clarifies the discrepancy, noted earlier in this 
report, between the results of Briggs et al. 
(1957) and anecdotal evidence on the im- 
portance of proprioceptive feedback from a 
control device in tasks which require rela- 
tively high levels of time sharing. 
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Actually, if the hypothesis tested here had 
not been supported, the results would have 
been in conflict not only with anecdotal evi- 
dence but with laboratory research as well: 
Fleishman (1956) has reported that perform- 
ance on a task requiring kinesthetic control is 
more highly correlated with performance on 
the Complex Coordination Task (a three- 
dimensional tracking task) later in training 
on the latter, criterion task than is the case 
for earlier stages of training. Thus, during 
training S begins to rely more and more on 
proprioceptive feedback cues. 

Therefore, if we combine the results of 
Fleishman’s earlier research and those of the 
present study, one can draw the following 
conclusions regarding control-loading fidelity 
in aircraft and similar tracking-type training 
devices: 

1. A simulator to be employed for training 
in rudimentary flight control need not utilize 
a high fidelity of control loading in that (a) 
the time-sharing requirements are relatively 
low and (0) the level of skill attainable in 
such a device probably would not require that 
S employ proprioceptive cues to a signifi- 
cant extent. 

2. However, control-loading fidelity be- 
comes very important in simulators which 
(a) are used to train for skills requiring time 
sharing among a variety of displays and con- 
trol devices, (b) are employed to provide 
extensive training, and/or (c) are utilized to 
maintain high levels of proficiency. 
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A NOTE ON THE JUDGMENT OF SPEAKER EFFECTIVENESS 


PHILIP ASH 
Inland Steel Company, Chicago, Illinois 


Ratings of the effectiveness of the speaker at each of 14 meetings (attended 
by 445 participants, including some who attended more than 1 meeting) on 
an 8-item rating scale showed that a simple rating procedure can yield useful 
discrimination as to the excellence of public-speaker addressed meetings. How- 
ever, this discrimination is primarily on a general or overall factor, with little 
evidence of differentiation among such elements as the speaker’s qualifications 
or ability, topic coverage, personal gain from meeting, or satisfaction of 
expectations. Ratings of “timeliness of topic” alone tended to be somewhat 
independent of the evaluation of the meeting itself. Ratings of the effectiveness 
of the speaker were not significantly correlated with attendance at the meetings: 
good speakers do not necessarily get large audiences, and vice versa. 


Many business and professional associa- 
tions have, as their principal activity, pro- 
grams of meetings that include a social hour, 
a dinner, an invited speaker or panel discus- 
sion, and a short question-and-answer period. 

This pattern is followed by the Industrial 
Relations Association of Chicago. Member- 
ship in the Association is by company. At 
each meeting there is a changing contingent 
of representatives from the member compa- 
nies, although there tends to be a hard core 
of faithful regulars. 

Not all meetings are equally well attended, 
however, and not all seem to satisfy the mem- 
bership equally. In an attempt to identify 
attendance attitudes toward individual meet- 
ings, therefore, and to discover possible ways 
of improving the meetings, a small meeting 
evaluation project was undertaken. 

A brief questionnaire was distributed in 
self-addressed stamped envelopes at the din- 
ner table before each meeting. Some question- 
naires were collected at the end of the meet- 
ing; most of the returns came by mail during 
the following week or so. 

The questionnaires asked for ratings on 
eight dimensions of the meeting: (a) timeli- 
ness of the topic, (0) coverage of the subject 
matter, (c) qualifications of the speaker, 
(d) speaker’s ability to hold the interest of 
the group, (e) satisfactoriness of the discus- 
sion period, (f) how much the listener got 
out of the meeting, (g) how closely the meet- 
ing fulfilled expectations about it, (/%) over- 
all rating (poor to excellent). 

For each question a score was assigned to 
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the four possible responses, ranging from 
0 (poor or equivalent) to 3 (excellent or 
equivalent). Survey summaries were prepared 
on each meeting that was evaluated, and an 
overall analysis was made of the whole rating 
procedure. That analysis is the subject of the 
present paper. The analysis is based upon the 
data from 14 meetings, at which a total of 
445 ratings were collected. 

If the total rating score (sum of the eight- 
item scores) is used as an index of the extent 
to which the meetings were judged to differ 
in excellence, it seems clear that the raters 
did discriminate, at least over a narrow range, 
as between better and poorer meetings. With 
a theoretical range of 0 (all ratings poor), to 
24 (all excellent), the total rating distributed 
as follows: 


Total rating No. of meetings 


11.0-12.9 2 
13.0-14.9 4 
15.0-16.9 3 
17.0-18.9 4 
19.0-20.9 1 


The range of mean total ratings for the 
14 meetings was from 11.6 to 20.5; the mean 
rating for all meetings was 15.9, with a 
standard deviation of 2.4. The pattern of 
the eight component ratings, however, sug- 
gested that the raters might not be discrimi- 
nating very much among the rating scales. 

To explore this point, two correlation 
analyses were made: first, the individual par- 
ticipant ratings on the eight scales were 
intercorrelated without respect to meeting; 
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TABLE 1 
CORRELATIONS AMONG MEAN SCORES FOR THE FOURTEEN SECTIONS (BELOW DIAGONAL) AND 
AMONG THE 445 INDIVIDUAL SCORES (ABOVE DIAGONAL) 

Dimension a b c d e f g h Total 
a. Timeliness of topic — 22 16 12 12 27 31 29 38 
b. Coverage of subject matter 05 = 61 62 45 64 70 81 86 
c. Speaker qualifications 15 77 — 61 41 Sil 59 65 75 
d. Ability to hold the group —02 79 83 _— 38 55 62 72 78 
e. Discussion period Mh = SO) 55 36 _ 41 47 51 64 
f. Gained from meeting Si 84 a, 75 49 _— 65 73 78 
g. Expectations from meeting 14 93 68 77 47 89 — 82 88 
h. Overall rating 05 96 84 86 57 86 91 = 93 
Total rating score 20 94. 88 87 65 90 92 98 = 
Attendance —14 38 DS 13 —01 06 27 27 23 





Note.—Decimal points omitted. 


and second, the average ratings for the 14 
meetings were intercorrelated, together with 
the number in attendance at each meet- 
ing. These two intercorrelation matrices are 
reported in Table 1. The correlations above 
the diagonal are based on the 445 individuals, 
those based on the 14 meetings averages are 
below the diagonal. 

The correlations on both sides of the 
diagonal reveal a fairly clear-cut pattern, as 
follows: (a) ‘Timeliness of subject matter” is 
positively related, but only to a minor extent, 
to judgments of the excellence of the meeting 
itself. (6) The correlations between “satis- 
factoriness of discussion” and the ratings of 
the remaining dimensions are slightly lower 


than those among the remaining ratings 
themselves. (c) All the other scale ratings are 
so highly intercorrelated with each other and 
with the total rating score as to reveal a 
large halo effect. They all seem to be measures 
of about the same thing, the general excel- 
lence of the meeting. The “overall rating” 
correlated so highly with the Total Rating as 
to be a reasonably satisfactory substitute for 
it. (d) The correlations between attendance 
and the mean ratings were all low, although 
generally positive, and not significantly dif- 
ferent from zero. The most effective speakers 
did not necessarily get the largest audiences, 
nor the least effective the smallest. 
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A CROSS-CULTURAL STUDY OF INDUSTRIAL 
RESEARCH PERFORMANCE * 


FREDERICK B. CHANEY 


Autonetics, a Division of North American Aviation, Inc., Anaheim, California 


To determine the cultural generality of selected United States findings, ques- 
tionnaire data covering working environment, job-behavior, and personal 
history variables were obtained from 220 British scientists. Responses for 120 
Ss were analyzed against 3 criteria of research performance: (a) supervisory 
ratings of research creativity, (b) number of patent specifications, and (c) 
number of technical publications. The significant variables for each criterion 
were used to develop scoring keys which were cross-validated on the remaining 
100 scientists. The resulting tetrachoric correlations were .64, .73, and .60 
for rated creativity, patents, and publications, respectively. Examples of 
significant items are presented and-the findings are related to prior United 


States studies. 


Psychologists have been investigating the 
factors which influence the productivity and 
creativity of United States scientists for a 
number of years (Taylor & Barron, 1963). 
This study was conducted with a sample of 
British scientists to obtain some preliminary 
data on the cultural generality of selected 
United States findings. 

One of the most useful techniques in 
predicting industrial research performance 
has been the life-history inventory (Taylor, 
1964). This device is particularly effective in 
predicting the complex criteria of scientific 
creativity because it provides information on 
a wide variety of motivational and person- 
ality traits. Perhaps the most extensive ap- 
plication of the scored life history is the work 
on petroleum research scientists reported by 
Smith, Albright, Glennon, and Owens (1961). 

In contrast to the life-history approach, 
which has been primarily concerned with the 
individual and his background, Pelz and 
Andrews (1962) have investigated the moti- 
vational and organizational variables which 
influence a scientist’s behavior in his current 
job situation. The present study attempted to 
combine elements of both approaches. 


1 This study was conducted under a NATO Post- 
doctoral Fellowship in Science at the Institute of 
Psychiatry, University of London. Computer facili- 
ties for the analysis were provided by IBM United 
Kingdom, Limited. The author would like to express 
his appreciation for the assistance given him by 
A. E. Hendrickson of the Institute of Psychiatry. 


METHOD 


The general method involved the item analysis of 
questionnaire data against three criteria of research 
performance and the cross-validation of several 
combinations of significant items. The 225-item 
questionnaire used in this study contained life- 
history, motivation, job-behavior, and working 
environment variables that had been employed in 
previous United States studies. 

Subjects. To define the sample, basic personnel 
data were obtained for all the graduate level em- 
ployees in three British research laboratories which 
agreed to participate in the study. These data were 
used to eliminate technicians and people spending 
over half their time on managerial or administrative 
duties. In addition, individuals not spending at least 
50% of their time on a combination of research 
and development, as opposed to technical service 
activities, were excluded from the sample. A totai 
of 310 scientists qualified for the study and the 
overall response rate was approximately 71%. In 
terms of their scientific field, 48% of the respondents 
were in chemistry with the remainder specializing in 
mathematics, physics, engineering, and the life sci- 
ences. For the analysis these Ss were randomly 
divided into two groups. The item analysis was 
based on 120 scientists and the remaining 100 
Ss were employed in the cross-validation study. 

Criteria. The criteria of research performance 
were: (a@) a 7-step rating of research creativity, 
(b) number of patent specifications published during 
the previous 5 years, (c) number of technical articles 
accepted during the same 5-year period. The defini- 
tion of creativity for rating purposes was para- 
phrased from Stein (1963) as follows: “Research 
creativity is that process which results in an original 
product or idea which is accepted as useful or 
satisfying at some point in time.” 

The existence of this process was inferred by the 
raters from a subjective evaluation of each S’s scien- 
tific products. In making this judgment, supervisors 
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N x 8.D. 
Returns ee 220 443 LL7i 


Non-returns WW _ ods 90 4,01 1.44 


FREQUENCY 





CREATIVITY RATING 


Fic. 1. Distribution of creativity ratings for 
returns and nonreturns. 


and other senior members of each laboratory were 
asked to consider the magnitude, originality, and 
usefulness of each person’s work. An average of 5.4 
ratings was obtained for each of the 310 qualifying 
scientists. 

The distributions of creativity ratings for the 220 
returns and 90 nonreturns are shown in Figure 1. 
While the mean is significantly lower for the scien- 
tists who did not participate, the high degree of 
overlap between the two distributions argues against 
the conclusion that these individuals are in general 
less creative. However, the figure does suggest that 
we were successful in obtaining questionnaire data 
from individuals with a wide range of rated 
creativity. 

In addition to this subjective measure, numbers 
of patents and technical publications were also used 
as objective indicators of scientific performance. 
However, these data were only obtained from the 
220 respondents. The number of patents ranged from 
0 to 5; whereas, the range for publications was 0-20. 
Both distributions were positively skewed with 
medians of .17 and .78 for patents and publications, 
respectively. 

Analysis. Because of the highly skewed distribu- 
tions for the two objective measures, the tetrachoric 
correlation (rtet) was selected as the best, single 
measure of the latent relationships among the cri- 
teria (Carroll, 1961). In computing these correla- 
tions and throughout the analysis, each variable was 
dichotomized as close as possible to the median 
without splitting a class interval. The resulting inter- 
correlations are shown in Table 1. The relatively 
high correlation between the rating and publication 
criteria is in agreement with Harmon’s (1963) find- 
ing that number of publications is a major determi- 
nant of research performance ratings. In addition, 
the low correlation between number of patents and 
rated creativity is consistent with data reported by 
Smith et al. (1961). 
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For the item analysis, each continuous variable 
was dichotomized and a chi-square was computed 
with each criterion. To analyze noncontinuous life- 
history items each response option was treated sepa- 
rately; a ¢ test was used to determine the signifi- 
cance of the difference in response percentages for 
the high- and low-criterion groups. 

Items significant at the .05 level were combined 
using simple +1 weights to form two scoring keys 
for each criterion. The first set was termed “selection 
keys” because it was restricted to life-history and 
preferred job-behavior data which could logically 
be obtained before employment. All significant vari- 
ables were used in the second set of scoring keys; 
these contained items on strength of motivation, 
methods of work, and the working environment 
which were specific to the current job assignment. 


RESULTS 


The number of significant items and the 
resulting cross-validation coefficients are 
shown in Table 2. With the three selection 
keys, correlations of .62, .46, and .54 were 
obtained for rated creativity, patents, and 
publications, respectively. When all the sig- 
nificant items were employed, only marginal 
increases in validity were obtained with the 
creativity and publication keys, but the cor- 
relation with the patent criterion was in- 
creased to .73. This increase was primarily 
due to the contributions of items covering 
methods of work and relations with col- 
leagues. When age, education, and experi- 
ence were correlated with the three criteria 
using the cross-validation sample, none of 
these variables was significantly related to 
patents and only level of education was re- 
lated to creativity (r = .35) and publications 
(7 = .44). 

When the data for all 220 cases were used 
to obtain an overall best estimate of the item 
validities, 78 variables were found to be 
significantly correlated with one or more of 








TABLE 1 
INTERCORRELATIONS OF CRITERIA 
(N = 220) 
Ratings Patents 
Patents le 
Publications 50 05 





Note.—The values in the table are tetrachoric correlations 
computed by the cosine method. 
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TABLE 2 
Cross-VALIDATION COEFFICIENTS 
(V = 100) 
Scoring keys 
Selection All items 
Criteria n Ttet n Ttet 
Creativity 18 .62 34 .64 
Patents 9 46 21 BUS 
Publications 10 54 28 .60 
Note.—n = number of items in scoring key; tet = tetra- 


choric correlation computed by the cosine method. 


the criteria. These items are classified by 
content category in Table 3 and examples 
are shown in Table 4. Now let us compare 
the findings for several of these item types 
with related United States studies. 


Discussion 
Because of variations in sample composi- 
tion, criterion measures, and questionnaire 


TABLE 3 


CLASSIFICATION OF ITEMS 














Criterion 
Total Crea- Publi- 
number _ tivity Patents cations 
Content of 
category items) 7 =p np n p 
Life history 88 16 .18 2 Opa See /) 


Preferred job 18 Ae? 4 22 4 .22 
behavior 


Strength of 5 Sr G0) 2 .40 2 .40 
motivation 

Methods of 12 1 .08 65.50 4 33 
work 

Relations with 47 eS eS: GmelS 
colleagues 

and super- 

visors 

Perceived 22 8 .36 4 18 Oy PAY 
working en- 

vironment 

Other items 33 

Total 225 39 32 37 





Note.—n = number of significant items (p <.05), p = pro- 
portion of significant items. 
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TABLE 4 


EXAMPLES OF SIGNIFICANT 
QUESTIONNAIRE ITEMS 








Criteria® 


Item CR PSae se 





Life history (self-perception) 


How do you usually behave in a 

group session with your associates? 
1. I feel free to express my views —* —* 
and I sway the group consider- 
ably. 


Life history (salary expectation) 


Approximately what annual salary 
do you think you will be earning 
ten years from now? 
4. Over 3,000 Pounds per year. a 


Preferred job behavior 


How closely does each statement 

below describe the approach to 

your work that you typically prefer 

to use? 
3. I prefer to map out broad fea= —* —*—— 
tures of important new areas. 


Strength of motivation 


In your opinion, how important is 
your technical work? ae 


Methods of work 


How closely does each of the follow- 

ing statements describe your work 

during the past five years? 
4. Generating detailed designs —_* — 
which a physical product or proc- 
ess can be produced or applied. 


Relations with colleagues 


How many people do you work with _—* — 
in your own department or division? 


Relations with supervisor 


How much can you influence the —* — 
person responsible for your work 
goals? 


Perceived working environment 


To what extent does your present 

job actually provide an opportunity 

for: 

1. Making full use of my present a 
knowledge and skills. 





a CR = Creativity Ratings, PS = Number of Patent 
pena mare AP = Number of Technical Articles Published, 
p <.05. p 
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construction, one must be extremely cautious 
about interpreting discrepancies between 
these results and those of United States 
studies as “real cultural differences.’ There- 
fore, we shall limit ourselves primarily to a 
consideration of the similarities which offer 
evidence for the generality of previous 
American findings. 

In support of the factorial study by Mor- 
rison, Owens, Glennon, and Albright (1962), 
favorable self-perception and salary-expecta- 
tion items were among the best life-history 
predictors. In addition, these data also con- 
firm Stein’s (1957) finding that creative 
scientists rate themselves as working much 
faster than most people. While many United 
States life-history items were predictive of 
British research performance, they appear to 
be quite sensitive to sample differences. 
Specific item options seldom gave the same 
relationship for both the United States and 
British samples. 

Other promising predictors of creativity 
and research performance are the variables 
which Pelz and Andrews (1962) have 
grouped under the heading “strength of 
motivation.” These items center around a 
scientist’s identification or involvement with 
his work, Typical items measure the extent 
to which one feels his work is important, 
interesting, or exciting. While only five of 
these variables were included in the present 
study they all provided significant correla- 
tions with one or more of the criteria. These 
findings are also in accord with Bloom’s 
(1963) observation that science students who 
become deeply involved in their research 
during their graduate training tend to become 
more productive researchers afterward. In 
view of the predictive value and generality 
of these motivation items, an attempt should 
be made to develop more extensive scales to 
measure this fundamental job attitude. 

With regard to the working environment, 
questionnaire items which Pelz has used to 
measure “perceived provision for self-develop- 
ment” all gave significant correlations with 
rated creativity. The two items measuring 
opportunity to use present knowledge and 
freedom to carry out own ideas were the best 
predictors and correlated significantly with 
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all three criteria. In addition, a number of 
other environmental variables dealing with 
opportunities provided by the job was posi- 
tively related to the performance measures. 

In spite of the reservations mentioned 
earlier, one apparent cultural difference 
should be mentioned. Studies with United 
States scientists have generally indicated that 
individuals who communicate frequently with 
their supervisor and colleagues tend to be 
more productive in terms of patents and rated 
technical contributions (Pelz, 1961). For the 
British sample, frequency of communication 
with one’s supervisor was negatively related 
to creativity and publications. Furthermore, 
frequency of communication with colleagues 
was not significantly related to any of the 
criteria, but range of communication was cor- 
related with both creativity and patents. This 
was indicated by the number of people with 
whom a scientist worked both inside and out- 
side his technical organization. The analysis 
also revealed that about one third of the 
British scientists studied had less than five 
significant colleagues and that these indi- 
viduals tend to be lower in rated creativity 
and patent production. 

In summary we have found that significant 
life-history variables from United States 
studies can be used to predict the perform- 
ance of British industrial scientists. However, 
these items do appear to be quite sensitive 
to sample differences and the keying of spe- 
cific options must be determined empirically. 
In addition, variables dealing with strength 
of motivation and perceived working en- 
vironment also displayed considerable cross- 
cultural generality. Finally the results sug- 
gest that the role of communication in 
industrial research performance may be quite 
different for British and United States 
scientists. 
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LIE DETECTION* 
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Two different samples of police trainees were used to investigate: (a) the 
effect of realistic stress in experimental lie detection, (b) the possible inter- 
ference with the GSR channel resulting from the simultaneous recording of 
(a) GSR detection results under stress 
were essentially similar to those obtained in mild experimental situations, and 
far superior in detection efficiency to analysis of heart rate changes. (b) The 
introduction of a blood-pressure cuff inflated to 80 mm. Hg for the 90 sec. of 
interrogation (similar to actual field measurement conditions) reduced the 
efficiency of detection of the GSR channel. (c) There is some suggestion that 
GSR reactivity may be related to ethnic origin. 


blood pressure. It was found that: 


Attention has recently been focused on the 
use of polygraph lie detectors by the United 
States Government, A recent review suggests 
that sufficient systematic research has not 
accompanied and supported the sharp in- 
crease in the use of these instruments in a 
number of agencies (Hearings, 1964). It was 
also felt that much of the existing research 
literature on this topic is of limited value 
because of the “artificial” laboratory condi- 
tions employed. 

The modern professional approach to lie 
detection is based on the utilization of a 
number of physiological measures obtained 
simultaneously, Standardization of practice 
has led to the employment of a limited 
number of polygraph instruments which usu- 
ally include the following channels: (a) blood 
pressure and pulse rate, () respiration, and 
(c) galvanic skin response (GSR). The pro- 
cedure for measuring blood pressure includes 
the use of a cuff placed around the upper arm 
of the subject (S). This cuff is inflated to 
a pressure somewhat above the diastolic blood 
pressure of S and this condition is maintained 
during the interrogation. The usual pressure 
used is about 90 millimeters of mercury and 
this pressure may be held as long as 24 
minutes depending, of course, on the number 


1The research reported in this paper has been 
sponsored by the Air Force Office of Scientific Re- 
search, through the European Office, Aerospace 
Research, United States Air Force. Under Grant No. 
61-63, and Contract AF 61 (052)-839. 
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of questions to be used in a particular por- 
tion of the investigation. This means of meas- 
uring blood pressure which is common to 
police work is rarely used in the laboratory 
studies for reasons described by Lacey 
(Hearings, 1964). 

A review of the literature undertaken to 
evaluate lie detection indicates two types of 
research data available for consideration: sta- 
tistical analyses of polygraph predictions in 
actual criminal investigations, and experi- 
mental studies generally limited to college 
students. While the first type of data are of 
obvious relevance to the actual applications of 
this technique they suffer from methodological 
limitations. The experimental studies, on the 
other hand, may have limited generality for 
actual lie detection in the field. 

An interesting discrepancy was noted in 
comparing reports deriving from the two dif- 
ferent sources of data in regard to the use- 
fulness of the GSR channel. The professional 
opinion of leading police authorities utilizing 
the polygraph in real-life situations is that 
the GSR is of limited value in lie detection 
(Inbau, 1948). The experimental studies, in 
contrast, have indicated a high degree of suc- 
cess for this channel which has probably be- 
come the most widely used in this type of 
research (Ellson, Davis, Saltzman, & Burke, 
1952; Gustafson & Orne, 1963; Lykken, 
1959). 

Examination of the laboratory investiga- 
tions would suggest that the conditions 


TABLE 1 


AVERAGE Basic CONDUCTANCE VALUES IN pwmho IN 
Four EXPERIMENTAL CONDITIONS 








Rest A B Cc 
M 18.9 20.2 20.3 20.2 
SD Sil 30 3.4 3.6 





created would be of little if any stress to 
the college students performing as_experi- 
mental Ss. It is even possible to conceive 
of some of the situations as providing re- 
wards for successful deception rather than 
punishment for failure to deceive. It may be 
suggested that the discrepancy in the evalua- 
tion of the GSR channel derives from these 
important differences between the field and 
the laboratory studies. Would the GSR lose 
its power as a discriminating index of decep- 
tion in a stressful situation with high per- 
sonal meaning? Through the cooperation of 
the Israel Police Force * an attempt was made 
to create such a situation during which lie 
detection could be carried out under con- 
trolled laboratory conditions. This attempt is 
described in Experiment I. 

The second experiment to be reported is 
an attempt to investigate the possible inter- 
ference of the blood-pressure measurement 
system with the measurement of skin conduct- 
ance. This may be another factor leading to 
the poor results of the GSR channel reported 
by operators in the field. 


EXPERIMENT [ 
Method 


Subjects. The lie-detection situation was presented 
as part of a selection procedure accompanying a 
police training course. Thirty-six Israeli policemen 
participating in a course were told to report to the 
laboratory of the psychology department for exami- 
nation. It was possible to arrange that none of the 
officers or men associated with the course be in- 
formed of the actual nature of these examinations, 
and the word experiment was never used. 

During the examination S was required to take 
three card tests. In each of these tests S chose 
a card from a pile of six placed before him, recorded 
the number on a form, and replaced the card in 


*We wish to thank the Israel Police Force, and 
especially A. Schurr, M. Kaplan, and A. Ben-Ishai, 
for their important help and cooperation during all 
the stages of the investigation. 
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the pile. He then was required to answer “no” to 
each of the series of questions in the form of 
“did you choose card number . ?” In an at- 
tempt to manipulate the stressfulness of the tests 
three different instructions were given before each 
of the tests. One instruction (A) indicated that the 
test to follow would merely test the operation of 
the apparatus itself. Another one (B) stated that 
this test would indicate whether S belonged to 
the group of people whose responses could be 
detected by the machine. The third (C) was expected 
to be most stressful and included the following: 


One trait that characterizes a successful policeman, 
with good chances of promotion, is the ability 
to control his emotions. A policeman not able to 
control his emotions has not much chance of 
promotion, and may not be suitable for service in 
the Israeli Police Force. We will now see what 
your chances are for promotion in the future and 
even whether you will be able to continue your 
service with the police.... 


After the three lie-detection situations had been 
completed S was required to sign a declaration of 
secrecy regarding the examination. 

The examinations were carried out in the evening 
in an air-conditioned laboratory maintained at 21 
degrees C. 

Apparatus. A Sanborn. model 150 polygraph was 
used in the recording of GSR and pulse rate. The 
GSR apparatus includes a 20 wampere constant 
current bridge circuit. Stainless steel electrodes were 
attached to the left hand using Sanborn electrode 
paste. Pulse rate was obtained from an E- and 
M-pulse pickup crystal attached to the right wrist. 

Procedure. After seating S and arranging the 
recording attachments the experimenter (#) explained 
the general test procedure. He then moved to the 
adjoining room containing the recording equipment, 
a one-way-vision window, and an intercom. After 
recording 3 minutes of baseline data E presented a 
taped sequence of experimental instructions. Follow- 
ing each presentation of the appropriate standard 
instructions preceding each test he allowed 2 minutes 
for rest or baseline, and then started the investiga- 
tion questions dealing with the card chosen. These 
were given in a predetermined randomized order. 


RESULTS 


As a first step in the data analysis an at- 
tempt was made to determine if there was 
any relation between the experimental in- 
structions and the physiological dependent 
variable baseline data: For this purpose basic 
conductance values in micromhos were calcu- 
lated from the skin resistance data taken at 
30, 60, 90, and 120 seconds after the begin- 
ning of the “rest” period prior to the examina- 
tion, as well as 30, 60, and 90 seconds follow- 
ing each of the specific instructions preceding 
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the card tests. The values for each condition 
were averaged for the individual, and then 
averages for the group were calculated for the 
different conditions, and are presented in 
Table 1. The only statistically significant dif- 
ference found using the critical ratio test for 
correlated data was between the first “rest” 
period (R) and all of the other three condi- 
tions (a < 0.01).8 

Five-second samples of pulse-rate data were 
also taken starting at the same points as 
above, averaged for each condition, and 
presented in Table 2. 

Other pulse-rate data deemed relevant 
which had been obtained through cooperation 
with the police were included in this table. 
Pulse-rate data obtained during the recruit- 
ment examinations of 26 of the experi- 
mental Ss were available. In addition, the 
pulse-rate data obtained during the actual lie- 
detection examinations of 22 criminal sus- 
pects who were later convicted of major 
crimes was also included. 

Once more employing a critical ratio for 
correlated data no statistical difference was 
found among the different conditions of the 
experiment. On the other hand, a significant 
difference was found between the pulse rate 
during the medical examination, and the rates 
during all of the different conditions of the 
experiment including the initial rest period 
(R) (a < .05). Applying a critical ratio test 
of the differences between uncorrelated means, 
significant differences (a < .05) were found 
between the mean of the criminals and those 


3 All of these analyses as well as a more detailed 
deseription of the procedure and apparatus employed 
appears in Kugelmass (1963). 
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TABLE 2 


COMPARISON OF PULSE-RATE DATA IN CRIMINAL 
INVESTIGATION AND EXPERIMENTAL 
SITUATIONS (Beats per minute) 


Criminal Police 
lie detec- medical Condition 
tion exam 


(NV = 22) (WW = 26) R A B c 





M 91.6 ize 
SD 17.1 4.8 


SZOMICS EOS AOS 
(Sel Seopeel 2. On 2.4. 





of the police medical examinations, the experi- 
mental rest period (R), and the B experi- 
ment condition. 

The above analysis indicates that the 
physiological baseline data reflects the experi- 
mental situation created. The data suggest 
that the situation was clearly more stress- 
ful than a medical entrance examination 
and less stressful than a real-life criminal 
investigation, 

The second step in data analysis was an 
attempt to evaluate the degree of success in 
detection in the two channels used. During 
each of the three experimental tests each 
card number was presented to S two times. 
The two GSR responses (maximal change 
from baseline at time of presentation of the 
number) to each card were averaged. The 
card yielding the highest average »mho value 
was considered the “selected” card. Actual 
success in detection could be evaluated by 
comparing this card number with the one 
chosen and recorded by S during the experi- 
ment. These GSR data are presented in Table 
3. Also included in this table are the detec- 
tions obtained by a similar analysis of changes 


° TABLE 3 


Tue Erricrency oF Detection Usinc tue GSR HiGHEst DEFLECTION, AND PULSE RATE HIGHEST 
CHANGE INDEPENDENT OF THE DIRECTION OF CHANGE IN THE THREE EXPERIMENTAL CONDITIONS 





Number of successful 





detections 
Condition GSR Pulse rate 
A 16 iQ 
B 19 6 
iC Ai 6 


Total 52 19 


Number of unsuccessful 





detections 
GSR Pulse rate Subjects 
20 29 36 
17 30 36 
19 30 36 
56 89 
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in pulse rate following the presentation of the 
card number. The pulse-rate data presented 
are based on the calculation of the highest 
mean rate change independent of direction of 
change during 5 seconds following the presen- 
tation of the card number. Two other analyses 
employing criteria of pulse-rate acceleration 
only or pulse-rate deceleration only gave 
lower rates of success in detection. In order 
to evaluate both the pulse rate and GSR 
data statistically a Poisson approximation to 
the binomial distribution (36, 1/6) was as- 
sumed. In all three experimental conditions 
the detection based on the GSR index was 
statistically better than chance (a < .001). 
In contrast no significant detection was ob- 
tained using the pulse-rate-change criteria. 
The difference in detection efficiency between 
the GSR channel and the pulse-rate channel 
is significant (a#<.05), using a test for 
dependent proportions. 

During the analysis of the polygraph 
records our attention was drawn to the clear 
individual differences in the size of the GSR 
reactions during the card test. Relatively 
small responses seem to appear more fre- 
quently in the records of Ss who were of 
Near Eastern origin (born in the Moslem 
countries of the Mediterranean basin, 
Morocco, Yemen, Iraq, and Iran). In order 
to test this impression we defined a ‘“non- 
reactor” for a particular experimental condi- 
tion as an S showing a decrease of less than 
1,000 ohms resistance following either of the 
presentations of the card actually chosen 
by him. Chi-square analyses of the distribu- 
tions of reactors and nonreactors as related 
to their origin, Near-Eastern (versus) Euro- 
pean or Israeli were carried out for the three 
experimental conditions. In all three more 


TABLE 4 


Ture NUMBER OF CARDS CORRECTLY AND INCORRECTLY 
DETECTED IN THE GSR AND CUFF CONDITION, 
AND THE GSR CoNnpITION 








Result of detection 





Condition Correct Incorrect 
GSR and cuff 11 29 
GSR 20 20 
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nonreactors were found among Ss of Near- 
Eastern origin; this difference reached sta- 
tistical significance (a < .01) in Condition B. 
Eight of the 36 Ss turned out to be non- 
reactors in all three experimental card tests. 
All eight of these total nonreactors were of 
Near-Eastern origin. 


EXPERIMENT II 
Method 


Subjects, Forty police cadets were obtained as Ss 
for this experiment from the Israel Police School in 
Jerusalem. The Ss were told that they were being 
used to test the efficiency of a lie detector to be 
used in the department of psychology. All the test- 
ing was carried out in an air-conditioned laboratory 
where a temperature of 21 degrees C was main- 
tained. The S was seated at a table facing a blank 
wall. The GSR was recorded as in Experiment I. 

Procedure. Each S was tested under two condi- 
tions thus serving as his own control in a random- 
balanced design to control for order effects. In the 
GSR and Cuff condition an ordinary HAKO 
sphygmomanometer cuff was wrapped around the 
upper part of the right arm of S and inflated 
to a pressure of 80 millimeters of mercury. The 
pressure was maintained for the 90 seconds of this 
condition. No blood-pressure cuff was used during 
the GSR condition. In both conditions S was 
requested to choose a card from a pile on the table 
(2, 3, 5, 8, 9, and 10 of diamonds), to write down 
the number of this card on a form placed before 
him, and to answer “no” to the subsequent ques- 
tions “did you choose card number ... ?.” In both 
sessions the number sequence was randomized such 
that each number appeared twice during the 90 
seconds of questioning. The first number asked in 
each condition was “one” which served as a buffer 
against initial startle. Three minutes of basal skin 
resistance during rest was recorded prior to the first 
of the two conditions. The basic skin resistance in 
ohms thus obtained was transformed into units of 
conductance. 


RESULTS 


Analysis of the data obtained during each 
of the two conditions was carried out by 
determining the card number yielding the 
highest mean ohm change, and comparing this 
result with the number previously recorded 
by S himself. The number of “hits” and 
“misses” in each condition is presented in 
Table 4. 

Using a Poisson approximation to an as- 
sumed binomial model (40, 1/6) it was found 
that the number of correct detections in the 
GSR and Cuff condition is not significantly 
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different from chance at the .05 level. The 
number of correct detections in the GSR con- 
dition, on the other hand, is significantly 
greater than chance (a < .001). A chi-square 
test of dependent proportions demonstrates 
a change of efficiency between the two condi- 
tions that is significant (a < .05; McNemar, 
1962). A closer look at the individual data 
revealed that all but three of the persons in 
the GSR and Cuff condition were among those 
detected in the GSR condition. This would 
meet the 90% criterion for a Guttman scale 
(Guttman, 1949), 

Further analysis indicates that the differ- 
ence between the mean GSR amplitudes to 
the “relevant” cards and to the nonrelevant 
cards in the GSR condition (%q = 475.5, sa 
= 660.5 ohms) is significantly different (a 
< .05) from the “relevant-nonrelevant’”’ dif- 
ference under the GSR and Cuff condition 
(¥q4 = 178.0, sa = 475.5 ohms). This would 
suggest that the introduction of the blood- 
pressure cuff tends to reduce the contrast be- 
tween responses to the relevant and nonrele- 
vant cards. One might formulate this as a 
reduction in efficiency due to a shift toward 
a lower signal to noise ratio. 


DISCUSSION 


The major purpose of the first experiment 
was to evaluate the efficiency of the GSR 
channel in lie detection in a more stressful 
situation than has been the case in previous 
experimental research. All feedback from the 
police sources, as well as our own impres- 
sions suggest that Ss viewed the instruc- 
tions as had been intended. This would have 
placed them in a rather stressful situation 
regarding the future of their. career. The 
baseline-conductance data suggest that stress 
was higher during the experimental tests than 
the introductory “rest” period. The sup- 
plementary pulse-rate data obtained from the 
police-recruitment medical examinations sug- 
gest that even the basic rest state might be 
regarded as involving considerable stress. The 
different conditions following the A, B, and 
C instructions which did not appear to gen- 
erate differential physiological reactions may 
be viewed as being superimposed on a higher 
than normal level of autonomic functioning. 
On the other hand, the pulse-rate data ob- 
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tained from actual criminal interrogation 
records suggest that the experimental exami- 
nations were less stressful than those encoun- 
tered during police investigation. 

The lie-detection analysis indicated that the 
GSR index was clearly superior to an index 
of pulse-rate change. In contrast to Davis’s 
(1961) optimistic prognosis the pulse-rate 
change index gave results no better than 
chance. Two recent studies employed lie-detec- 
tion card tests in relatively nonstressful lab- 
oratory situations that are sufficiently similar 
to bear comparison to the present study. 
Ellson.et al. (1952) arrived at 73% correct 
identifications against a chance expectancy 
of 1/4. Gustafson and Orne (1963) obtained 
45% correct identifications in their total 
sample and 64% correct identifications in 
their motivated group as against a chance 
expectancy of 1/5. Experiment I produced 
48% correct identifications as against a 
chance expectancy of 1/6. The GSR condition 
part of Experiment II (a completely inde- 
pendent sample) which included only minimal 
stress resulted in 50% correct identifications 
as against a chance expectancy of 1/6. 

In order to compare the three studies the 
ratio of the observed correct identifications 
to the expected values was computed to repre- 
sent the “gain” of the observed over the 
expected identifications. The ratio obtained 
in Experiment I, 2.9, is exactly the same as 
that obtained by Ellson et al. (1952), 2.9, 
and only slightly lower than that found in the 
more detectable group by Gustafson and Orne 
(1963), 3.2. It is almost the same in Experi- 
ment IT. We might conclude, then, that within 
a considerable range of stress no necessary 
decrease in the detection efficiency of the 
GSR channel need be expected. It is possible, 
however, that the stress generated here did 
not reach the range of stress actually taking 
place in real lie detection which is thought to 
cause this decrease. 

The finding of a possible relationship be- 
tween ethnic origin and GSR reactivity was 
unexpected. It is of interest that Johnson 
and Corah (1963) have recently reported 
differences in the skin resistance of Negro 
and white Ss. It is difficult, however, to see 
any connection between race or ethnic origin 
and GSR reactivity, and at this stage further 
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analysis with additional samples is in progress 
at this laboratory. 

It is only possible to speculate on the 
mechanism by which the inflated cuff reduces 
the GSR efficiency since so many variables 
might be involved, for example, homeostatic 
balancing mechanisms, pain, attention, etc. 
Whatever the mechanisms involved, these 
findings may explain the poor results reported 
by those interrogations using the standard 
procedure involving simultaneous recording of 
blood pressure and GSR. Since these reports 
have often been used to suggest that the GSR 
is of questionable value as a differential index 
during high stress (Woodworth & Schlosberg, 
1956) it would appear justified to reconsider 
their relevance to this issue. 
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A 33-item scoring key composed of personal history items originally validated 
for research personnel in a petroleum laboratory was applied to research per- 
sonnel in a pharmaceutical laboratory. Significant validities were obtained, in 
the new setting, between personal history scores and several criteria of research 
productivity and creativity. These results were interpreted to suggest that 
empirical keys may have more generality than is commonly believed. 


At least three recent studies have shown 
that multiple-choice personal history items can 
differentiate between research personnel who 
vary in productivity of patents and/or pub- 
lications, as well as in the degree of their 
rated research creativity (Buel, 1965; Ellison 
& Taylor, 1962; Smith, Albright, Glennon, & 
Owens, 1961). While the “tailor-made” scor- 
ing keys typically produced by this approach 
are of decided value to the sponsoring insti- 
tutions, they will nevertheless be of limited 
scientific and general utility until their com- 
plex factorial structure is better understood 
and their generality and cross-validity dem- 
onstrated. Factor analytic studies, such as 
that of Morrison, Owens, Glennon, and Al- 
bright (1962), may help in the former regard, 
but the prevalent view of empirical keys as 
time-, organization-, and job-specific will not 
be modified until validity generalizations have 
been conducted. Demonstrable generality 
could aid, for example, in formulating more 
comprehensive theories of job performance 
by establishing stability, across occupations 
and organizations, of criterion-relevant char- 
acteristics measurable by the personal history 
technique. In addition, investigators with 
more applied interests would benefit from ac- 
cumulation of an item pool which would not 
need to undergo complete item analysis in 
each new setting. It is in this latter area that 
the present study has particular relevance; it 
describes an attempt to generalize a set of 
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item scoring weights developed for a sample of 
physical scientists to a group of biological 
scientists employed in another organization. 


BACKGROUND OF THE STUDY 


Of relevance to this study is the paper by 
Smith et al. (1961) demonstrating concur- 
rent linear correlations for 100 physical sci- 
entists in a petroleum research laboratory. 
Biographical data scores were correlated with 
an Overall Research Performance Criterion 
(.61 with 37 items), a Rated Creativity Cri- 
terion (.52 with 22 items), and a Patent Dis- 
closure Criterion (.52 with 22 items). These 
validities are a function of 59 discrete per- 
sonal history items; some items were weighted 
against more than one criterion. 

With the permission of the authors of that 
study, these same 59 items (with occasional 
minor changes in wording) were included as 
part of an 118-item personal history ques- 
tionnaire administered to a group of research 
personnel in the employ of a major pharma- 
ceutical research and manufacturing organi- 
zation.2, The pharmaceutical study (Buel, 
1965) reports the reanalysis of these 118 
items, and shows that 50 of them were valid 
against a percentage position criterion of cre- 
ativity (V = 80).? When these same 50 items 


2 The authors wish to thank G. D. Searle & Co. 
for permission to report this study. 

3 The percent position criterion arose from inde- 
pendent creativity rankings of each subject by the 
director of research, the assistance research .mana- 
ger, and an immediate supervisor. These three rank 
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TABLE 1 


VALIDITY COEFFICIENTS, WITH AND WITHOUT CONTROLS 
ON AGE, FOR A THIRTY-THREE-ITEM PETROLEUM 
Key APPLIED TO PHARMACEUTICAL RESEARCH 














PERSONNEL 
(Nie—"132) 
Criterion 
Linear Publica- 
Coefficient rank Patents tions SERP 
r aS Zor oon Son 
r partial mois .26* poi com 





Note.—Regarding partial correlations, the 33-item key 
correlates .18 with age, whereas the criteria are correlated with 
age as follows: linear rank = .31; patents = .26; publications 
= .43, and SERP = .08. 

Sp Ole 

** > < .001. 


were scored for a hold-out sample of 52 phar- 
maceutical researchers, the total scores corre- 
lated .57 with this same criterion. Further 
evidence of the validity of these 50 items was 
obtained by correlating the scores (computed 
separately for 41 items suitable for personnel 
without prior job experience and 9 items suit- 
able for experienced personnel), with other 
criteria: number of patents, number of publi- 
cations, and a score derived from The Super- 
visor’s Evaluation of Research Personnel 
(SERP), a forced-choice research performance 
rating scale (Buel, 1962). All validity coeffi- 
cients, both linear and multiple, were signifi- 
cant at the .05 level or beyond (correlations 
ranged from .34 to .51) except for the linear 
correlation between the ‘“‘no previous experi- 
ence” items and SERP (r = .14, p >..05). 


METHOD 


Although in the study just cited, 50 items were 
weighted against criteria specific to the pharma- 
ceutical situation, the study reported here concerns 
only the 33 items which were valid for the petroleum 
Overall Research Performance Criterion and which 
had not been reworded for inclusion in the pharma- 
ceutical questionnaire. Employing petroleum weights 
for these 33 items, the pharmaceutical samples’ 
questionnaires were rescored. The 33 items cover 








distributions were converted to percent position 
distributions after Garrett (1947). Upon demonstra- 
tion of substantial correlations between these three 
percent position distributions, 127 subjects were as- 
signed a criterion score made up of the average of 
their individual percent position scores (for 5 sub- 
jects a single score served as a criterion score). 
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such areas of personal history as high school and 
college hobbies and performance, early familial 
relationships, self-perceptions in terms of job per- 
formance, research interests and aspirations, etc. 

Since the pharmaceutical research subjects had 
not been involved in the original item weight- 
ing, all 132 of them (basic and cross-validation sam- 
ples combined) were usable in this study. Relevant 
linear correlations were obtained between these 
scores and the percent position criterion of creativ- 
ity, the patent criterion, the publication criterion, 
and SERP. Appropriate intercorrelations were also 
calculated. 


RESULTS AND DISCUSSION 


As may be seen in Table 1, substantial 
cross-validities resulted from the application 
of the petroleum key to the pharmaceutical 
sample. Because one of the 33 petroleum items 
inquired about number of publications, that 
item was removed for purposes of correlation 
with the publication criterion so as to elimi- 
nate spurious overlap. Since a corresponding 
patent item did not occur in the petroleum 
key, such a correction was not necessary for 
purposes of correlation with the patent cri- 
terion. Further, because patents and publi- 
cations, and perhaps the percent position and 
SERP criteria could be contaminated by age, 
partial correlations were calculated for these 
personal history-criterion combinations, re- 
moving the effects of age. Table 1 also pre- 
sents those correlations, and demonstrates a 
gratifyingly small age related shrinkage. 

In comparison to the original pharmaceuti- 
cal study, the validities presented here are 
strikingly similar in magnitude. Apparently, a 
somewhat foreign key predicts criterion rele- 
vant behavior about as well as a locally con- 
structed key, and the cross-validity cannot 
be attributed solely to item overlap or simi- 
larity (the correlations of the petroleum scores 
with the “no previous experience” and “pre- 
vious experience” pharmaceutical scores are 
both .39). However, in those situations where 
given items did overlap both petroleum and 
pharmaceutical keys (21 of 33), inspection of 
the scoring weights showed that they corre- 
spond, in relative magnitude, for a majority 
of the intraitem response categories. In this 
context, it is interesting to note that essen- 
tially this same personal history form proved 
to be the most valid of a large number of in- 
struments employed by McDermid (1965) in 
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the assessment of engineering creativity. 
These combined findings suggest that empiri- 
cal personal history keys may possess suffi- 
cient validity and generality to warrant ex- 
tension to other professional and research 
groups, but in no case without corroborative 
research parallel to that discussed here. 
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AWARENESS AS A 


FUNCTION OF ITS 


MEANINGFULNESS, SEQUENTIAL POSITION, 
AND PRODUCT UTILITY’ 
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A study of meaningfulness of brand name, sequential position of the brand 
name in a slogan, and the utility of the product for the consumer on brand 
awareness. 50 adult male and 50 adult female Ss were asked to rate some 
advertisements and then to recall the brand name in each. 2 such recalls, 
immediate, and a 24-hour delayed, were taken from Ss. 3 categories of 
consumer products in terms of utility, 2 orders of sequential presentation of 
the brand name, and 2 levels of meaningfulness were used for these advertise- 
ments following a 2 X 2 X 3 factorial design. Superior learning and retention 
were observed for (a) brand names having more meaningfulness and (b) brand 
names of high-utility products. No significant difference in learning was found, 
however, between brand names following a product description and those 
preceding it. Results are interpreted in terms of existing theories of verbal 


learning. 


This study investigates the nature of sev- 
eral variables which, from a hypothetical 
standpoint, are assumed to have some bearing 
upon the strength of brand-product associa- 
tion. An attempt has been made to determine 
the pattern of their comparative influence. 
The variables are: (a@) meaningfulness of the 
brand names, (b) sequence of presentation of 
the brand names in relation to the products, 
(c) differential needs of the consumers for the 
products. 

The following are the hypotheses formu- 
lated: 

Hypothesis I. More meaningful brand 
names will create greater brand awareness 
than less meaningful brand names. Recently 
several studies have been done on measure- 
ments of attributes of verbal materials such 
as meaningfulness and emotionality (Ka- 
nungo & Panda, 1963, 1964a, 1964b; Noble, 


1 The authors wish to acknowledge their indebted- 
ness to S. Sengupta, Director, Clarion Advertising 
Services Pvt. Ltd., S. Ghosal, Managing Director, J. 
Walter Thompson Co. Pvt. Ltd., and K. T. Chandy, 
Director, Indian Institute of Management, Calcutta, 
for their extensive support and cooperation in this 
research undertaking. The financial support came in 
part from each of the three organizations mentioned 
above. Grateful acknowledgement is also extended to 
S. K. Bose of Calcutta University and Asim K. 
Dutta of Jadavpur University for their kind help in 
the conduct of this study. 
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1961). Potential influence of such measurable 
attributes of verbal material on verbal learn- 
ing has been amply demonstrated (Ausubel, 
1963; Underwood & Schulz, 1960). These 
studies have significant implication for ad- 
vertising research. If a more meaningful ver- 
bal item is learned faster and retained better 
than a less meaningful verbal item, then it is 
worthwhile demonstrating the usefulness of 
such measures (Kanungo & Panda, 1964b; 
Noble, 1961) in the area of advertising. 
Earlier studies have suggested that while 
choosing a new brand name it is better to get 
a simple, short, and descriptive name because 
it can make a lasting impression on consum- 
er’s mind (Elliott, 1937; Longstaff, 1936). 

Hypothesis IJ. Brand names following a 
product description will create greater brand 
awareness than brand names preceding prod- 
uct description. A brand name in an adver- 
tisement is often seen not in isolation, but as 
a part of a slogan that describes the nature 
of the product. Thus, essentially, two items 
are presented in succession in an advertise- 
ment, that is, a brand name and the product 
and/or the need satisfying quality of the 
product. In line with the principle of forward 
association, it is suggested that to ensure bet- 
ter brand awareness among the consumers 
advertisements should mention first the need 
and/or product, then the brand name (Lucas 
& Britt, 1950, p. 77). 


BRAND-AWARENESS 


The situation is analogous to forward (S-R) 
and backward (R-S) associations in an inci- 
dental learning situation. Early studies on the 
role of direction of association on learning ex- 
plained the differences between the strength 
of S-R and R-S associations in terms of in- 
tentional and incidental learning (Feldman & 
Underwood, 1957; Jantz & Underwood, 
1958). Considering R-S learning as another 
form of incidental learning, the effect of 
meaningfulness on R-S association has been 
studied (Jantz & Underwood, 1958). These 
studies have reduced the role of direction of 
association in learning to one of motivational 
problems; S-R or forward association con- 
ceived as intentional or motivated learning 
and R-S or backward association conceived 
as incidental or unmotivated learning. In the 
present study, however, the role of direction 
of association per se is studied. 

Hypothesis III. High-utility products en- 
sure greater brand awareness than low-utility 
products. In an advertisement, the brand name 
is always seen in relation to a product that 
has some utility for the individual. It can be 
presumed that an advertisement of a product 
of high utility for the individual represents a 
more reinforcing situation to the individual 
than an advertisement of a product of low 
utility. Since degree of learning is positively 
related to degree of reinforcement involved 
in learning situation, it is expected that brand 
awareness of high-utility products would be 
better than those of low-utility products. 


MeETHOD 
Subjects 


The subjects (Ss) for this study were 50 adult 
males and 50 adult females drawn from three insti- 
tutions of Calcutta. Their ages ranged from 19 to 33. 
The mean age of the female group was 20.8 years, 
and that of the male group was 24.5 years. The Ss 
came from different parts of India and were par- 
ticipating in educational programs of the Indian 
Institute of Management, Calcutta University, and 
Jadavpur University. 


Design and Materials 


Three different categories of consumer product in 
terms of utility were chosen, namely, products used 
by males, by females, and by both. Six commonly 
used products from each of these categories were 
chosen. All the brand names chosen were three let- 
ter nonsense syllables of the form consonant-vowel- 
consonant; nine of these were of more meaningful- 
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ness value and the other nine were of less meaning- 
fulness value as determined in an earlier study by 
Kanungo and Panda (1964b). Three products in each 
of the utility categories were given brand names of 
more meaningfulness value. For each of the 18 prod- 
ucts two different advertisements were prepared, one 
with the brand name followed by a short descrip- 
tion of the product use, and the other with the 
reverse sequence. Twenty copies of each of these 36 
advertisements were printed on quarto size art 
paper (27 centimeters X 20 centimeters). In each of 
the printed layouts the slogan appeared at the top 
and below it a picture of the product advertised was 
presented. At the bottom, the name and address of 
the advertiser (a fictitious manufacturing concern) 
were given. 

Finally, 40 booklets were prepared, each contain- 
ing an advertisement for each of the 18 products in 
such a manner as to include 9 advertisements where 
brand name preceded the product use in the slogan 
and 9 others where the reverse sequence was fol- 
lowed. Each of the slogans was a short, simple, and 
popular description of the product. Care was taken 
to ensure that the variations between different slo- 
gans were reduced to minimum. The sequence of 
these 18 advertisements in each booklet was then 
randomized so as to ensure varied order of pres- 
entation to Ss. The slogans containing the brand 
names for each of the products and the meaningful- 
ness values of the brand names are presented in 
Table 1. 


Procedure 


Each S was given a booklet and was asked to rate 
each advertisement in the booklet on each of three 
7-point scales: pleasant-unpleasant, good-bad, and 
ambiguous-clear. The S was given 18 answer sheets, 
one sheet for one advertisement in the booklet, to 
record his ratings. The three bipolar scales were printed 
on the answer sheets with verbally labeled cate- 
gories, and S was instructed to indicate his or her 
ratings by underlining the appropriate categories. 
The order of presentation of the three scales was 
randomized for each answer sheet, and for each 
scale, half the time the positions of the polar ad- 
jectives were reversed. The Ss were made to believe 
that the manufacturer intended to launch an ex- 
tensive advertising campaign with a view to intro- 
ducing these new products into the markets. Hence 
the advertiser wanted to know the reaction in terms 
of the three scales of a small sample of population 
towards these advertisements. 

Each S was given approximately 1-1.5 minutes 
to evaluate each advertisement on the three scales. 
Immediately after S had finished rating all advertise- 
ments, the booklet and the answer sheets were col- 
lected. The S was then given a recall sheet on which 
all the 18 products were listed in alphabetical order 
and was asked to recall as many brand names as 
possible and write them beside the name of the 
products with which they were associated. After 
the recall period, S rated each product for its use- 
fulness on a 7-point scale. 
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TABLE 1 


Propucts, SLOGANS, AND MEANINGFULNESS (m’) OF BRAND NAMES 

















m’ of brand 
Product Slogan name 
1. Shaving blade (a) For smooth shaving use MOS 2.69 
(b) Use MOS for smooth shaving 
2. Cigarette (a) For smoking pleasure have QOV 1.45 
(b) Have QOV for smoking pleasure 
3. Readymade shirt (a) For elegant looks wear NIR 2.45 
(b) Wear NIR for elegant looks 
4. Socks (a) For comfort and durability look for WIB 1.42 
(b) Look for WIB for comfort and durability 
5. Boot polish (a) For boot polish use ROK 2.62 
(b) Use ROK for boot polish 
6. Vest (a) For comfortable wear use BEJ 151 
(b) Use BEJ for comfortable wear 
7. Face powder (a) For a brighter complexion use RIL 3.20 
(b) Use RIL for a brighter complexion 
8. Perfume (a) For gentle fragrance use YOM 1,53 
(b) Use YOM for a gentle fragrance 
9. Saree (a) For fashion and quality choose LON 2.67 
(b) Choose LON for fashion and quality 
10. Handbag (a) Fashionable women always prefer XAD - 130 
(b) XAD is always preferred by fashionable women 
11. Nail polish (a) Add to your glamour with FER 2.44 
(b) Use FER and add to your glamour 
12. Ornament (a) For latest in designs ask for ZEG 1.36 
(b) ZEG gives you the latest in design 
13. Hair oil (a) Ensure healthy hair growth with ZAT 2.43 
(b) ZAT ensures your healthy hair growth 
14. Toothpaste (a) Remove bad breath with YUV 1.29 
(b) Use YUV and remove bad breath 
15. Soap (a) For cleaner and softer skin use KEJ 2.34 
(b) KEJ ensures cleaner and softer skin 
16. Shoes (a) Walk at ease with JUF 1.37 
(b) Wear JUF and walk at ease 
17. Fountainpen ink (a) For smooth flow in writing use JIP 3-25 
(b) JIP gives smooth flow in writing 
18. Drug (for headache) (a) For fast relief from headache use NUG S75 7 


(b) NUG gives fast relief from headache 


Note.—Items 1-6 are male-use products, and items 7-12 are female-use products, and items 13-18 are products used by both 
sexes. High and low m’ are alternated in this list. The m’ values were determined in an earlier study by Kanungo and Panda 


(1964b) on a 5-point meaningfulness scale. 


A delayed recall of brand names was taken from 
Ss after 24 hours of their immediate recall. The pro- 
cedure followed during delayed recall was similar to 
that followed during initial recall. 


RESULTS 


Consideration of S’s ratings of usefulness 
of the products on the 7-point scale reveals 


that for male Ss, the mean ratings of the six 
male-use products is 4.84, and that of the six 
female-use products is 1.81. In the case of 
female Ss, male-use products received a mean 
rating of 3.03 and female use products, 4.41. 
The six products used by both received a 
mean rating of 5.27 by male Ss and of 5.68 


BRAND-AWARENESS 


by female Ss. These results suggests that our 
classification of products into three categories 
in terms of utility was valid. 

The nature of the learning situation yielded 
low-recall scores for individual Ss, and in 
many cases resulted in the absence of any 
recall. During the immediate recall test, 9 
male Ss and 11 female Ss did not recall even 
one brand name. During the delayed recall 
test, no brand name was recalled by 14 males 
and by 2 female Ss. 

Effect of meaningfulness. Each of the 18 
brand names was exposed to 100 Ss. Of these 
1,800 exposures 900 were more meaningful 
(MM) and 900 were low meaningful (LM) 
brand names. One hundred and forty-three of 
the MM and 60 of the LM brand names were 
recalled in immediate recall condition, the 
recall percentages being 15.88 and 6.66, re- 
spectively. The difference between these two 
percentages was found to be significant be- 
yond the .001 level (¢ = 6.13). For delayed 
recall test, a total of 267 recall scores were 
obtained out of which 183 were MM and 84 
were LM brand names. The difference be- 
tween the recall percentages (20.33% MM 
and 9.33% LM) was also found to be signifi- 
cant beyond the .001 level (¢ = 6.60). The 
null hypothesis that LM brand names are 
equally as well recalled as MM brand names 
is therefore not tenable. The results support 
the hypothesis that a brand name having 
more meaningfulness value is learned and re- 
tained better than a brand name having low 
meaningfulness value. 

Effect of direction of association. In im- 
mediate recall, out of a total recall of 203 
brand names, 114 were brand names appear- 
ing at the end of the slogan (FA or Forward 
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Association) and 89 were brand names ap- 
pearing at the beginning of the slogan (BA or 
Backward Association). This amounts to 
12.60% FA and 9.88% BA recall. This dif- 
ference, when tested, was found to be insignifi- 
cant (¢= 1.87, p> .05). In delayed recall, 
out of a total recall of 267 brand names 145 
were FA and 122 were BA. The difference be- 
tween the recall percentages (16.11% FA and 
13.55% BA) also failed to reach the level of 
statistical significance (¢= 1.56, p> .05). 
Thus, in case of “direction of association” the 
null hypothesis could not be rejected. It 
should be noted, however, that both in im- 
mediate and delayed recall FA is found to be 
greater than BA. This, at least, suggests the 
possibility that brand awareness is better in 
the case of “Need or product > Brand name” 
sequence than in the case of reverse sequence. 
Further investigation is required to clearly 
determine the effect of this variable on reten- 
tion. 

Effect of utility of the product. Immediate 
and delayed recall of brand names of male-use 
products, of female-use products and of prod- 
ucts used by both sexes, by Ss of each sex, is 
presented in Table 2. 

Chi-square for each of the contingency ta- 
bles of immediate and delayed recall is sig- 
nificant beyond the .001 level. Testing for 
each of the proportions of recall frequencies 
as presented in Table 2 for male and female 
Ss separately, it was observed that in immedi- 
ate recall, male Ss recalled significantly fewer 
brand names of female-use products than ex- 
pected (Z= 2.27, p< .05). In delayed re- 
call this tendency did not reach statistical sig- 
nificance level (Z = 1.74, p > .05). Female 
Ss, however, in both immediate and delayed 


TABLE 2 
FREQUENCY OF BRAND NAMES RECALLED IN TERMS OF THREE UTILiry CATEGORIES 








Immediate recall 


Delayed recall 





Recall by Recall by Recall by _—_— Recall by 
male Ss female Ss male Ss female Ss 
Male-use products 40 24 36 38 
Female-use products 21 2 24 82 
Products used by both sexes 33 33 36 51 
Chi-square 16.135* 14.441* 





*) < -001. 
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recall, listed significantly more brand names 
of female-use products (Z values are 3.20 and 
5.27, respectively, both significant beyond the 
.01 level) and fewer brand names of male-use 
products (Z values are 2.51 and 4.35, signifi- 
cant beyond the .05 and .01 level, respec- 
tively) were recalled. This supports the hy- 
pothesis that brand awareness of high-utility 
products is superior to that of low-utility 
products. 

More brand names were recalled under the 
delayed condition than in the immediate con- 
dition. A total of 267 brand names were re- 
called during the delayed recall test as com- 
pared to a total of 203 brand names recalled 
during the immediate recall test. Apparently, 
what happened here was that the immediate 
recall measurement served as a second learn- 
ing trial and thus the delayed recall measure- 
ment was affected. 


DISCUSSION 


The hypothesized effects of meaningfulness 
and the utility of the product on brand aware- 
ness have been clearly demonstrated. The 
effect of sequential position, however, shows 
only a trend consistent with our hypothesis. 
Brand names that have high-scaled meaning- 
fulness value are learned and retained better 
than those that have low-scaled meaningful- 
ness value. It may be emphasized that while 
choosing new brand names for products, the 
verbal units that have been already standard- 
ized for their scaled meaningfulness may be 
of use to the advertiser. Besides, the com- 
parative effectiveness of two brand names can 
be predicted very accurately by scaling the 
brand names on meaningfulness dimension. 
The usefulness of measured attributes of 
verbal materials is not only limited to labora- 
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tory use in learning experiments, but these 
materials and techniques could fruitfully be 
used in business by advertisers. Many adver- 
tisers in practice try to choose nonsense sylla- 
bles as brand names that have some existing 
association with words in the current language. 
The study gives support to the soundness of 
such practice in psychological terms. 
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This investigation was an attempt to evaluate the effectiveness of an electronic 
obstacle-detecting (O/D) device for the blind. Ss were 26 totally blind indi- 
viduals. 3 series of training sessions on the O/D were conducted. Performance 
was assessed in 1 pretraining session with the customary mode of travel and 
3 posttraining sessions with the O/D. Ss were also given several psychological 
tests and 2 interviews. Using the O/D on a standard obstacle course, Ss took 
longer to walk than with customary aid, but errors were the same. Ss who 
walk unassisted made many fewer errors with the O/D than without it. For 
those Ss using a cane or a dog, the O/D was of little help. After more training 
on the use of the O/D, Ss reduced the time to walk the obstacle course 
while errors remained about the same. On the field tests Ss made fewer errors 
but took longer with the customary mode of travel than with the O/D. About 


The amazing ability of the blind to detect 
obstacles was noted as long ago as 1779 by 
Diderot. It was not until recently, however, 
that the precise mechanism by which this feat 
is accomplished was determined (Ammons, 
Worchel, & Dallenbach, 1953; Cotzin & 
Dallenbach, 1950; Supra, Cotzin, & Dallen- 
bach, 1944). Further investigations demon- 
strated that this “obstacle sense” varies from 
one blind person to another, that some blind 
individuals lack this skill completely, and that 
it is not peculiar to the blind. 

Interest in improving the mobility of the 
blind has turned therefore to the development 
of travel devices. The oldest method of travel 
has been the cane which acts both as a probe 
and a guard. It can be considered as an exten- 
sion of the hand and arm. Objections have 
been raised by some of the blind, however, 
on the basis of the conspicuousness of the 
cane and its interference with others. More 
serious is the restriction of information con- 
cerning “space” to the immediate proximity 
of the person. Numerous attempts have been 
made therefore to provide the blind person 
with some form of distance receptor. 

The present investigation is a preliminary 
report of an attempt to evaluate the effective- 
ness of such a device developed for the Vet- 

1This study was conducted under Contract 
V1005p-9639 from the Prosthetic and Sensory Aids 
Service of the Veterans Administration to Tracor, 
Inc., Austin, Texas. The assistance of J. F. Smith and 


Gareth McCoy in the selection, training, and evalua- 
tion of Ss is gratefully acknowledged. 


+ of Ss indicated a desire to own the instrument. 


erans Administration by Bionic Instruments 
from an original design by Lawrence Cranberg 
in 1945. It emits a beam of light distinguish- 
able from ambient light, detects reflections 
of this light from obstacles, determines dis- 
tance by triangulation, and presents the in- 
formation to the user by means of tactile 
stimulation, The information concerning tar- 
get distance becomes available to the user 
through operation of the range switches.? It 
is the purpose of the present study to assess 
the Obstacle Detector (O/D) as to its tech- 
nical excellence and its acceptance by the 
blind operator. 


METHOD 


In any comprehensive assessment of a device for 
aiding the blind person to travel about his environ- 
ment both subject (S) and training variables have to 
be considered. Moreover, the problem of technical 
excellence should include a comparison of the device 
with other means of travel customarily employed by 
the blind. 


Subjects 


Ideally, it would have been desirable to use a 
group of newly totally blinded individuals—some 
of whom, at random, would be trained on the use 
of the O/D; others on the cane; others on the guide 
dog; and finally, the remaining blind would be 
trained to move about the environment without 
any device at all. Practical considerations, of course, 


2See User’s Manual for the Veterans Adminis- 
tration Obstacle Detector, Bionic Instruments, Inc., 
Philadelphia, Pennsylvania, Contract No. V1005P- 
9217, September 1962, for complete descriptions of the 
instrument. 
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compelled us to select Ss from the blind population 
that was available. Agencies and institutions for 
the blind were contacted in and around Austin 
to obtain a list of potential Ss. The Ss were 
interviewed to obtain background data and to 
explain the project to them. Twenty-six Ss were 
finally selected. Each S’s physician was contacted to 
secure data on visual acuity, angle of visual field, 
cause of blindness, age at blindness, etc. The Mobil- 
ity Aids Scale (Wright, 1961) was used to classify 
Ss in terms of how much assistance was required 
in travel. 

To explore the possibility that personality variables 
are related to performance on the O/D and to pro- 
vide a source for future development of an empiri- 
cally derived predictor scale, three psychometric tests 
were administered: WAIS Vocabulary test, CPI, and 
the Emotional Factors Inventory (Bauman, 1958). 


Training Sessions 


Three series of training sessions were conducted: 
stationary, simple travel, complex travel. The pri- 
mary purpose of the stationary series was to fa- 
miliarize S with the instrument. Following instruc- 
tion on the use of the device, S would simply 
stand in one spot and attempt to locate various 
kinds and sizes of obstacles placed at different 
distances from him. Errors in locating the object 
were noted and corrected. The time it took to report 
the object was recorded. The S was required to 
practice until he could locate each obstacle on four 
successive trials. The simple traveling series was de- 
signed to teach S$ to avoid obstacles while traveling 
a simple path. In the complex traveling series, Ss 
were trained to use the device in a more complex 
and crowded situation. Both an indoor and outdoor 
obstacle course were set up containing one to four 
obstacles. On these more difficult series, a criterion 
of two successive trials was used for successful 
performance. 


Test Trials 


The test trials consisted of four parts: pretraining 
performance with the customary mode of travel 
(without O/D), posttraining performance with the 
O/D, field performance on an unfamiliar city block, 
and performance under instructions to walk as rap- 
idly as possible (speed test). The pre- and posttrain- 
ing trials used two test situations, a standardized 
indoor and outdoor obstacle course. 

Pretraining series. To obtain a base line perform- 
ance for evaluating the O/D, a pretraining series of 
trials, where each S used his customary travel aid 
while blindfolded, was conducted on two standard 
obstacle courses: (a@) a concrete outdoor sidewalk, 
85 X 4 feet bordered on both sides by grass, and (b) 
an indoor corridor, 95 X 5 feet. Eight obstacles, vary- 


8 Appreciation is expressed to the Texas Lions 
Camp at Kerrville, Texas, to the State School for 
the Blind, and to the Lighthouse for the Blind in 
Austin, Texas, for their cooperation in the selection 
of Ss. 
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ing in width, height, and material, were used. Five 
trials were given to every S on each obstacle course. 
Both errors (collisions) and time were scored. An 
error was counted when any part of the body 
touched an obstacle. Time was measured from the 
moment when S was placed on the starting position 
and given the signal to begin walking until he had 
walked to the end of the obstacle course (85 and 95 
feet, respectively). Twenty-five Ss took the pre- 
learning series. 

First posttraining evaluation. Soon after the blind- 
folded S had completed the three training series, he 
was tested on the standard obstacle course. The 
procedure here duplicated that in the pretraining 
series with the exception that S now used the 
O/D. Five trials were given on the indoor and out- 
door course in the same order and with the same 
placement of the obstacles as in the prelearning 
series. A second postlearning series was given after 
2 weeks of unsupervised use of the O/D. Nineteen 
Ss were available for this series of tests. 

Field training under supervision. In this session, 
each S spent 1 hour per day for 10 days walking 
with the O/D in six semifamiliar and unfamiliar 
areas accompanied by a trainer. The role of the 
trainer was to coach S in the proper use of the O/D, 
to emphasize the avoidance of obstacles, and for the 
first time in a training session, S was instructed to 
walk as rapidly as possible. Whenever S held the 
instrument or scanned improperly, the trainer pointed 
out the proper procedure. The trainer assisted S to 
use range switches properly, helped him to maintain 
motivation by the use of praise and encouragement, 
and reassured him in the avoidance of danger. The 
trainer noted collisions (number and kind) and 
reported comments made by S. 

Unfamiliar obstacle course. Following the super- 
vised training session, each S was then evaluated on 
his use of the O/D in an unfamiliar city block. Four 
trials were given to each S on this same block, that 
is, walking up and back twice, in a counterbalanced 
order with the O/D and with customary mode of 
travel. The S walked the entire length of the block 
as quickly as possible, entering and leaving a store 
in the middle of the course. 

Speed session. The final evaluation session of the 
O/D was on the standard obstacle course used in the 
first two evaluation sessions. The Ss were instructed 
on each of the 10 trials (outdoors and indoors) to 
walk as rapidly as posible, to locate a “hole” and 
not be concerned about the size or kind of obstacle. 
After all trials were completed, Ss were again inter- 
viewed concerning their reactions to the O/D. Eleven 
Ss were available for this part of the study. 


RESULTS 


Nineteen Ss completed their first postlearn- 
ing evaluation and 14 of these Ss were avail- 
able for the second postlearning tests follow- 
ing 2 weeks of unsupervised use of the O/D. 

There were wide individual differences in 
pretraining performance. The highest number 
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TABLE 1 


MEAN ERRORS AND TIME (SECONDS) ON THE OBSTACLE COURSE WITHOUT (PRELEARNING) AND WITH THE 
OBSTACLE DETECTOR (POSTLEARNING) ACCORDING TO TRAVEL AID 

















Prelearning First postlearning 
Outdoors Indoors Outdoors Indoors 
Travel aid N Errors Time Errors Time Errors Time Errors Time 
Cane 11 1.20 55.3 1.36 62.8 1.80 134.1 1.64 15185 
Dog 3 1.07 28.3 1.03 29.7 1373 120.7 1.93 142.0 
None 5 4.36 60.2 3.88 71.4 2.20 173.6 2.16 140.8 
All Ss 19 2.01 50.8 1.97 59.8 1.89 142.4 1.82 147.2 





Note.—The seven Ss who had not taken the postlearning tests are not included in this comparison. 


of errors occurred in Ss who used neither cane 
nor dog as a travel aid. They also took the 
longest to walk the obstacle course. As would 
be expected, Ss with the dog aid are consist- 
ently the fastest on the course. The rank- 
order correlation for errors indoors and out- 
doors was .90. 


Overall Evaluation 


The most direct test of the efficiency of 
the O/D is to compare the performance of Ss 
traversing the course without and with the 
O/D. Table 1 shows the results according to 
the mode of travel. Only the errors and time 
for the first postlearning tests were used for 
this analysis since more Ss were available for 
this series. The means for the postlearning 
series were 1.8 and 143 seconds outdoors, and 
1.8 and 141 indoors. 

From Table 1, it is seen that the mean 
scores for all 19 Ss before and after training 
show little change in the number of errors 
but almost a threefold increase in the time 
to traverse the obstacle courses. It seems that 
the O/D resulted in a decrease in efficiency 
as far as time is concerned and practically 
no gain at all in avoiding obstacles. These 
results, however, tend to be misleading. 
Inspection of Table 1 shows that for the 
five Ss using a cane or a dog, the O/D was 
of little help. As a matter of fact, they did 
somewhat poorer with O/D. The increase in 
errors for these Ss and the overall increase 
in time may have been due to (a) insuf- 
ficient training on the O/D, (0) the difficult 
and somewhat unrealistic obstacle course, (c) 
low motivation for those Ss already using a 


travel aid, (d) too little emphasis on speed, 
and (e) S’s interest in exploring the environ- 
ment rather than avoiding obstacles. Further 
training was therefore provided as described 
in the procedure with emphasis on avoidance 
of obstacles and speed in walking with 
the O/D. 


Speed Trials 


Eleven Ss were retested on the same 
obstacle course under instructions to walk as 
rapidly as possible. A comparison of the 
results under these conditions with those on 
the first postlearning series shows clearly the 
significant decrease in time with little increase 
in errors. The time was cut almost in half. 
Compared to the prelearning series, Ss took 
about 14 seconds longer on the average 
during the speed trials. Thus there is little 
doubt that the instrument can be as effective 
as the cane as far as speed is concerned. The 
two Ss in the speed trials who customarily 
used.a dog, however, still did much better 
with the dog than with the O/D. 


Field Tests 


The final overall evaluation consisted of the 
comparison of the performance of Ss on a 
relatively unfamiliar city block with their 
customary mode of travel and with the O/D. 
The Ss made fewer errors with their cus- 
tomary mode of travel (2.6 versus 3.4) and 
took longer than they did with the O/D (488 
versus 400 seconds). The overall picture, 
however, is again misleading. The Ss who 
customarily use mo travel aid did far better 
with the O/D on both errors and time. All 
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the other Ss walked faster with their cus- 
tomary aid. 


Subject’s Evaluation 


There are marked differences among Ss in 
their reactions to the instrument. For ex- 
ample, at the time of the first interview, 23% 
of Ss had no desire to own an O/D, 9% 
did not care, and 68% did express a desire for 
ownership. At the time of the second inter- 
view, the comparable figures are 33%, 27%, 
and 40%. These differences in reaction to the 
O/D suggest that it would be extremely 
useful to be able to predict those who would 
and those who would not respond positively 
to such an instrument. 


Subject’s Comments 


The consistent reactions to the O/D which 
stand out most clearly may be summarized 
as follows. On the positive side, the O/D is 
seen as useful in familiar situations and in 
flat places without step-ups or step-downs, 
the extended range provides information about 
distance not available with other travel 
aids, it is easy to carry around, and with a 
few changes it would be much in demand. 
On the negative side, some are afraid to trust 
it in crowded places or in crossing streets, 
the problem of stairs and curbs and step- 
downs is not solved by the O/D alone, the 
switch arrangement is seen as unreliable both 
in terms of objects it will detect and in terms 
of variations in the strength of the beam as 
the battery loses strength, and it is felt to be 
too heavy to use comfortably for more than a 
short period of time. 


Individual Differences 


Since the total sample is relatively small 
and perhaps unrepresentative, the findings 
concerning the relationship between the per- 
sonality variables and performance are only 
suggestive. 

The personality correlates of performance 
on the street-travel trials with and without 
the O/D were determined. The correlation 
between errors with and without the O/D was 
—.20, Interestingly enough, two distinctly 
different patterns emerge with respect to 
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those Ss who are most effective in using the 

O/D versus those who are most effective with- 

out the O/D. Those Ss who made the fewest 

errors in traversing the course using the O/D 

tended to be more outspoken and _ sharp- 

witted, more planful and responsible, more 
capable and cooperative, less neurotic and 
emotionally unstable, to have fewer somatic 
symptoms, to feel more socially competent, 
and to have fewer feelings of inadequacy and 
inferiority than those individuals who made 
the most errors in using the O/D. Those who 
did well with their traditional mode of travel 
aid tended to be more retiring and inhibited, 
more awkward and conventional, more delib- 
erate and moderate, more immature and lazy, 
more defensive and demanding, more sus- 
picious and aloof, more anxious and cautious, 
more easy-going and unambitious, more dis- 
trustful of others, and to have greater feel- 
ings of inadequacy and inferiority than those 

Ss who made relatively more errors with their 

traditional mode of travel. These findings sug- 

gest that those blind individuals who have 
mastered the use of the O/D sufficiently well 
to utilize it most efficiently in a real-life situ- 
ation are more effective psychologically and 
better adjusted than are those individuals 
who are unable to do so. The possibility of 
predicting effective O/D users on the basis of 
scores on these personality tests seems prom- 
ising, 
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A METHOD FOR CHOOSING A CUTTING POINT ON A TEST 


RICHARD B. DARLINGTON anp GLENN F. STAUFFER 1 


Cornell University 


Elementary decision theory is used to derive a formula for finding a cutting 
point on a continuous test used to distinguish between 2 criterion groups, 
when the test scores of each criterion group are distributed approximately 
normally. The formula considers the difference between the means of the 2 
criterion groups, the standard deviations of test scores of the 2 groups, the 
relative sizes of the 2 groups, and the relative seriousness of a “miss” versus 


a “false positive.” 


When a continuous (i.e., not discrete) test 
is used to distinguish between members of two 
criterion groups, some test score must be 
chosen as the cutting point, so that individuals 
scoring below that point can be treated as if 
they were in the first criterion group, and those 
scoring above that point as if they were in the 
second. Ideally, the cutting point should be 
selected by a method which considers the test- 
score distributions of the two criterion groups, 
the relative sizes of the two criterion groups, 
and the relative seriousness of the two types of 
misclassification. ‘I his paper presents formulas 
by which the optimum cutting point can be 
estimated when the test scores within each of 
the two criterion groups are approximately 
normally distributed. 

For example, suppose a high school offers 
two courses at the same hour so that students 
must choose between them. The first is a 
foreign language course designed for students 
going to college. The second is a typing course 
which is considered more valuable than the 
language course for students not going on to 
college. The prediction problem, then, consists 
of distinguishing those who are going to college 
and who would therefore benefit more from the 
foreign language course, from those who will 
not go on to college and who therefore would 
benefit more from the typing course. A 
scholastic aptitude test, let us say, is used to 
make this prediction. Suppose that in the 
recent past each freshman class entering high 
school has consisted of 100 students, of which 
40 have subsequently gone to college and 60 


1 Although the junior author is a major in the United 
States Air Force, opinions expressed herein are solely 
the authors’ and in no way reflect those of the United 
States Air Force or the Department of Defense. 


have not. The test scores of those who have 
gone on to college have a mean of 100 and a 
standard deviation of 5, while the test scores 
of those who have not gone on have a mean 
of 80 and a standard deviation of 7. We shall 
say it has been estimated that it is approxi- 
mately 13 times as serious an error to give the 
typing course to a student who subsequently 
goes on to college, as it is to give the foreign 
language course to a student who does not go 
to college. (The problems and procedures in- 
volved in arriving at such an estimate are 
common to all uses of decision theory in 
psychology, and are discussed in Cronbach 
and Gleser, 1957, Ch. 10.) 
We can introduce the following notation: 


Meaning or value 





in present 
Symbol General meaning example 
Ai First criterion group Those going to 
college 
Ag Second criterion group Those not going to 
college 
Ni Frequency of criterion 40 
group A, in the popu- 
lation 
Ne Frequency of criterion 60 
group A, in the popu- 
lation 
Cy Treatment appropriate Foreign language 
to Ai course 
Co Treatment appropriate Typing course 
to As 
AUs, Gain resulting when Unspecified 
treatment C; rather 
than C2 is given to a 
member of Ai 
AUa, Gain resulting when C2 Unspecified 


rather than C; is given 
to a member of Az 
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Nie 


Relative seriousness of 1 
the two types of mis- 
classification 


AUa, 
AUVa, 





Scholastic aptitude 
test 


xX A continuous test used 
to distinguish be- 
tween A; and As 


Xp Optimum cutting point To be found 


on X, so that those 
scoring above Xp 
should be given treat- 
ment Ci, those below, 
treatment Co 


m1 Mean of sample test 100 
scores of members of 
Ai 


Si Standard deviation of 5 
sample test scores of 
members of Ay 


M2 Mean of sample test 80 
scores of members of 
Az 


Se Standard deviation of 7 
sample test scores of 
members of As 


Figure 1 illustrates the standard decision- 
theory procedure for finding Xp.?_ In Part A 
of Figure 1 are shown the absolute frequency 
distributions of scores for each of the two 
criterion Groups A; and A». Part B is an 
enlargement of the section of Part A sur- 
rounded by dots. 

For convenience of exposition, in Figure 1 let 
Xo a Xi — X3 —_— Xo == whe. Setting Xp => X3 
results in the misclassification of the portion 
of Group Ag to the left of X3 and the portion of 
Group Ai to the right of X3. Moving Xp from 
X3 to X»2 results in a gain in total utility equal 
to the product of AU,, and the number of 
people in Group Az between X3 and Xa, since 


these people are reclassified properly by the - 


change. The loss in total utility is the product 
of AU,, and the number of people in group Ay 
who are reclassified into Group Ao. As dx 
becomes infinitely small, the total gain in 
Group A, from moving Xp the distance dx 
to the left becomes f,(X;)-AU,4,:dx, where X; 
is the midpoint of interval dx. Similarly, the 
loss in Group A; approaches f,(X;)-AU,,-da, 
and the net gain in both groups combined 


2 Although in the above example the criterion group 
with the higher mean was labeled Aj, this labeling is 
arbitrary; in Figure 1 Group A, has the lower of the 
two means, 
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Fic. 1. Absolute frequency distributions of X scores 
conditional on membership in criterion Groups Aj, 
denoted by f:(X), and A», denoted by f.(X). Figure B 
is an enlargement of the indicated portion of Figure A. 


approaches f2(X;)-AUa,:da — f1(X;)-AUa,-da. 
Net gain approaches zero as the cutting point 
approaches its optimum value, Xp. This well- 
known principle can be stated by the equation 


(1) 


The problem is to estimate from sample data 
the value of Xp at which [1] holds. However, 
even if the population frequency distributions 
are smooth, so that there is only one point at 
which [1] holds, sample frequency curves are 
likely to be so irregular that [1] will hold at 
several points in a sample frequency dis- 
tribution. An alternative solution is thus 
needed. 

If it is reasonable to assume that the two 
population distributions are normal, the gen- 
eral form of equation [1] becomes: 


f,(Xp) -AUx, = £,(Xp) -AUa, oe 0. 


Ni-¢1(Xp)-AUa, a No:¢2(Xp)-AUa,, [2] 


where 


¢1(Xp), ¢2(Xp) = Ordinates at Xp of the 
two normal probability 
curves representing the 
scores for the two cri- 


terion groups. 


From the formula for the normal distribution, 
we have 





1 — (Xp —#})? 
xX oe 2o12 3 
$1(Xp) woe e E a] 
if — (Xp —#2)2 
$2(Xp) = we hd [3b] 


oe 
oor 
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‘'where 


41, w2 = Means of the two X score distribu- 
tions, 

01, ¢2 = Standard deviations of the two dis- 
tributions. 


Rearranging and substituting [3a] and [3b] 
into [2] and replacing w1, we, 71, 2, and Xp 


(so?m 1 — Sime) + (si82) 





Xp = 


Two values of Xp will satisfy equation [5]. 
Usually, one will be between, or nearly be- 
tween, m; and mz, while the other will be to 
the extreme right or left of both distributions. 
If only one value of Xp is to be used, the value 
between or nearly between the two means 
would maximize utility. Equation [5] gives 
two values of Xp because two normal curves 
with different standard deviations actually 
intersect twice. The normal curve with the 
larger standard deviation will always be higher 
than the other at both X = and X =—». 
If the two normal curves intersect at all, 
therefore, they must intersect twice. When 
AUa, = BUas py, shows that fy (Xp) = fo (Xp), 
so that Xp is at the intersection of the two 
curves. If there are two intersections, then 
strictly speaking Xp is at both, and a person 
with either an extremely high or an extremely 
low score should be assumed to belong to the 
criterion group with the larger standard 
deviation. The reasoning is similar when 
AUa, ¥ AUa,. In practice, however, the as- 
sumption of normality is only an approxima- 
tion. Fortunately, in actual practice, one value 
of Xp as given by [5] is usually so far to the 
right or left of both distributions that prac- 


(m; — me)? + 2(so? — s,*) loge | 


($27,—. 837) 
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by their estimators, we have 


= &p-m))! 
so‘ Ni-AUa, e 2812 
s1:Nq-AU,, © —p=myt" [4] 
é€ 2s»? 


Taking logarithms to the base e of both sides 
of [4] and solving for Xp (assuming sz? ¥ s,2), 
we have 





S2° N, AVA 
s1°N2 AUa, 








ae 


tically nobody falls beyond it, so that the other 
value of Xp is the only one of practical im- 
portance anyway. 

As indicated above, formula [5 ] holds when- 
ever S;” ~ so’. When s;? = s,’, the solution of 
[2 ] becomes 


A m, + my» Sa 
Xp a SS + 


, Mog” Tay 





Ni AUa, 
x Joes] FST |» 061 
Since N; and Np» appear only in ratios, 


Ni 
the criterion group base rates ——~— and 
cae Ni + Ny 


Tw, may be substituted for the values of 
N;, and Nz in both formulas [5 ] and [6]. 

When Xp has been determined, the areas to 
its left and right in each distribution can be 
found in normal curve tables by the usual 
method. These areas then serve as the proba- 
bilities of test responses conditional on group 
membership, for use in evaluating the test. 

Substituting the values from our example 
into equation [5], we have 





Xv = 


Xp = 150.58, 91.08. 

The first of these values is more than 10 
standard deviations to the right of either mean 
and can thus be ignored. Assigning those 
students whose test scores are above 91.08 to 
treatment C;, and those whose scores are 


(7)2(100) — (5)2(80) + (7)(5) r (100 — 80)? + 2(49 — 25) lots | 
(49 — 25) 





(7) (40) 3 
(5) (60) 2 





below 91.08 to treatment Cy, will result in 


maximum utility for the total group. 


REFERENCE 
Cronzacn, L. J., & GLESER, G. C. Psychological}tests 
and personnel decisions. Urbana: University of 
Illinois Press, 1957. 
(Received March 10, 1965) 


Journal of Applied Psychology 
1966, Vol. 50, No. 3, 232-235 
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An investigation of the role of marriage and career orientations in the voca- 
tional development of college girls. The problem dealt with the construction 
of an attitude inventory and the comparison of selected factors in the voca- 
tional development of 3 groups of college girls grouped on the basis of their 
attitudes as measured by the inventory. 180 college girls were administered 
the Career and Marriage Attitude Inventory (CMAI). 30 girls were selected 
for study on the basis of scores earned on the CMAI. A structured interview- 
case study was used to prepare summaries which were ranked by 3 impartial 
judges. Significance at the 1% level was determined by Kendall’s Coefficient of 
Concordance and the application of the F test. Comparison among groups 
revealed girls differ in respect to career-marriage attitudes and occupational 


interests. 


Literature is sparse of data concerning the 
vocational development of girls or the role 
of the male-association factor as an influence 
in vocational choice. In a factor analysis of 
the Strong Vocational Interest Blank for 
Women (SVIB-W), Crissy and Daniel (1939) 
found four factor loadings which they called 
interest in male association, people, language, 
and science. Interests of housewives, office 
workers, stenographer-secretaries, and nurses 
were found to have heavy loadings with the 
male-association factor, whereas the interests 
of artists, authors, librarians, physicians, and 
social workers were found to have negative 
loadings. The factor of male association pre- 
dominates so strongly that 90% of high 
school seniors exhibit it as their controlling 
interest. 

Similar findings were reported by W. L. 
Layton (1958) who reported that high scores 
on scales for elementary teacher, office 
worker, steno-secretary, and housewife were 
received by girls who plan to work only until 
marriage. Low scores on these scales were ob- 
tained by women in the College of Science, 
Literature and the Arts. It was concluded 
that these distributions probably reflect dif- 
ferences in “career versus marriage’ inter- 
ests. Another research project concerning sex 
differences in values and desires was con- 
ducted by Singer and Stefflre (1954). They 
offer the speculation that job values and de- 
sires of adolescent girls reflect the learning of 
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sex stereotypes and cultural expectations of a 
more submissive female role. 

In one of the few reported studies in the 
area of career orientation and marriage orien- 
tation of adolescent girls, Hoyt and Kennedy 
(1958) investigated the interest and person- 
ality correlates of freshman college women. 
They were able to differentiate between fresh- 
man women who expected to become home- 
makers and freshman women who expected 
to be career women, as measured by their 
scores on the SVIB-W and on the Edwards 
Personal Preference Schedule (EPPS). 

This investigation was essentially explora- 
tory in nature to determine whether the orien- 
tations of girls toward careers and toward 
marriage are worthwhile avenues for the 
study of vocational development. In this con- 
text, vocational choice was viewed as a con- 
tinuous, complicated process of decision- 
making, influenced by a combination of fac- 
tors which interact, are modified, and develop 
in time becoming relatively stable during late 
adolescence. Ever since research using factor 
analysis revealed that girls were primarily 
interested in male association (Crissy & 
Daniel, 1939), it has been assumed, though 
not empirically proved, that this factor over- 
shadows her attempts to prepare for a career, 
However, a recent survey (Women’s Bureau 
of the Department of Labor, 1959) of women 
college graduates, including both married and 
single women, revealed that over three fourths 
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‘of college graduates are employed full time. 
There appears little doubt that research con- 
cerned with the vocational choice of girls 
should attempt to shed light on understanding 
the force of this factor in the combination of 
factors influencing vocational choice. 


STATEMENT OF THE PROBLEM 


This investigation was designed as an at- 
tempt to determine the role of career orienta- 
tion and marriage orientation in the voca- 
tional development of sophomore, junior, and 
senior college girls on the basis of the follow- 
ing selected factors: interests, achievement, 
emergence and persistence of occupational 
preferences, key figure influence, and work 
experience, 

The problem, in general, consisted of two 
parts, as follows: first, the construction of a 
simple inventory to measure the attitudes of 
college women toward marriage and toward a 
career, and second, the comparison of selected 
factors in the vocational development of 
three groups of college girls grouped on the 
basis of their attitudes as discovered by the 
inventory. The basic hypothesis was: In their 
vocational development, marriage-oriented 
college girls, career-oriented college girls, 
and marriage-career-oriented college girls 
differ on the basis of interests as 
measured by the SVIB-W, achievement as 
determined from grade-point averages, time 
of emergence and persistence of present major 
occupational preferences, key figures (intra- 
family or extrafamily) who influenced occu- 
pational preferences, and amount of experi- 
ence at paid employment. In view of the 
findings of past research, it was hypothesized 
that marriage-oriented girls would have higher 
scores on occupational scales of the SVIB-W 
with positive male-association factor loadings 
and that career-oriented girls would have 
higher scores on occupational scales with 
negative male-association factor loadings. 


METHODS AND PROCEDURES 


In the spring of 1960, 180 sophomore, junior, and 
senior girls at Indiana University were adminis- 
tered the Career and Marriage Attitude Inventory 
(CMAI). Five schools, including women from 28 
major areas of study, were represented in the sample. 
These were dispersed in the following manner: 49% 
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in the School of Education, 28% in the School of 
Arts and Sciences, 9% in the School of Music, 9% 
in the School of Business, and 5% in the School of 
Health, Physical Education and Recreation. 

The CMAI was developed to serve as a criterion 
for selecting and grouping subjects (Ss) to test the 
basic hypothesis of this investigation. It consisted of 
two attitude scales, the marriage scale containing 
10 items or statements and the career scale con- 
taining the same number of statements. These state- 
ments were derived from questions previously used 
in a similar research by Ginzburg, Ginzburg, Axelrod, 
and Herma (1951), Hoyt and Kennedy (1958), and 
Super (1957). Girls with high scores on the marriage 
scale and low scores on the career scale were con- 
sidered to be marriage oriented. Girls with high 
scores on the career scale and low scores on the 
marriage scale were considered to be career oriented. 
Girls with average scores on each of these two 
scales were considered to be oriented in both 
directions, to be marriage-career oriented. 

Thirty Ss were selected from this population 
of 180 girls for study on the basis of scores earned 
on the CMAI. Ten Ss who obtained scores on 
the career scale one standard deviation or more 
above the mean were designated as Group 1 or the 
“career-oriented” group. Group 2 was designated the 
“marriage-oriented” group and selected in the same 
manner except that scores obtained on the marriage 
scale were one standard deviation above the mean. 
Group 3 was designated the “mixed-oriented” group 
because only Ss were selected for this group who 
obtained scores within one standard deviation of the 
mean on each scale. These 30 Ss were college women 
between the ages of 18 and 22 years, who had com- 
pleted three or more semesters of college education 


and who had committed themselves to a college 
major. 
A structured interview case study was employed 


as the primary method. The interview was suf- 
ficiently flexible to permit the use of probing ques- 
tions where replies were not clear or complete. The 
interview also gave Ss opportunity to qualify state- 
ments or to elaborate. It was constructed around 
four areas of importance in understanding the voca- 
tional development of college women, key figures, 
vocational preferences, work experience, and the 
future. 

Three impartial judges ranked case study sum- 
maries on scales constructed around three areas: key 
figure influence, emergence and persistence of occupa- 
tional choices, and work experience. Agreement 
among the judges was determined by the coefficient- 
of-concordance method, and the F test was applied. 

Other data were collected from college records and 
from the SVIB-W, Form W. 


FINDINGS 


Students of each group were judged to 
have responded similarly to the structured 
interview. Although there was substantial 
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TABLE 1 


CHI-SQUARE FOR OBSERVED AND EXPECTED SCORES ABOVE THE MEAN ON THE SVIB SCALES WITH 
MALE-ASSOCIATION LOADINGS 











Number of girls with scores above the observed mean 








Observed and in occupations with male-association factor loadings 
expected 
Group frequencies Negative Positive Other 
Career (fo) 33 38 52 
Mixed (f,) 23 44 61 
Marriage (fo) 19 48 67 
All groups (fe) 35 30 70 
agreement (as measured by Kendall’s co- The number of girls above and below the 


efficient of concordance) between judges in average grade-point mean were tabulated and 
their ratings of the interview data, none of analyzed using the chi-square method. The 
the interview variables revealed significant chi-square value was not significant. 

differences between the groups. Results on the occupational scales of the 


TABLE 2 
VOCATIONAL INTEREST BLANK MEAN ScorES OF CAREER GRrouP, MIxED Group, AND MARRIAGE GROUP 














Career Career Marriage 
group and group and group and 
mixed marriage mixed 





Career group Mixed group Marriage group group group group 

Scale M o M o M o t t t 
Artist 30.6 6.68 DS Deed 29.4 7.90 1.13 39 O70 
Author 2A ahs 10S OO, 17.4 7.22 93 1.49 65 
Librarian 2 OLDS 22:0 Oe gs AOS) Dias 1.39 39 
English Teacher 7) GES 21.4 8.70 11.4 8.62 1.33 1.69 3.04*** 
Social Worker (Rev.) 34.2 5.96 38.6 5.83 PRevss) FSP? 1.68 1.93* 3.09" 
Psychologist 23.6 9.65 Dist SH 19.0 6.08 .26 1.28 1.78* 
Lawyer 35.1 9.66 28.5 4.44 23.1 4.84 1.96* SO 2nee 1.86* 
Social Science Teacher Do ELS 24.6 6.36 20.7 6.45 29 97 1.29 
Y.M.C.A. Secretary 19.0 4.75 17.9 4.68 16.4 8.17 Boe 105 79 
Life Insur. Saleswoman 30.4 8.92 Dial eS 24.7 6.84 1.06 1.59 95 
Buyer 24.9 8.30 26.5— 4575 PAPE gS O8 ee 91 
Housewife 33: saeco 50:9 noel 35: Cm O02 SP 1.38 A9 
Elementary Teacher 36.0 2.34 43.8 6.32 41.7 6.46 1.16 2.63** 713 
Office Worker 42.6 8.03 43.6 7.31 45.8 8.87 84 .84 61 
Steno-Secretary 42a eon 44.5 4.34 45.2 7.19 88 95 .26 
Business Ed. Teacher 33.2 9.86 34.7 9.76 33.0 TOMS 34 .03 39 
Home Econ. Teacher PRO) Hoh 29.2 5.96 30.8 8.58 1.82* 197F A8 
Dietitian 22 Oe eZO 24.7 4.96 ZR) 1.32 2.44** 1.22 
Physical Ed. Teacher 26.4 3.82 28.1 1.74 30.8 4.66 1.28 Proline 1.71 
Occup. Therapist 34.2 10.78 34.4 12.66 41.4 10.81 .03 1.49 1.31 
Nurse 24.9 10.56 26.1 6.65 33.9 -13.75 30 1.64 .98 
Math-Science Teacher 28.4 9.20 PY EIN Se AG 29.9 6.34 36 | 42 AZ: 
Dentist ; 22 Sees 19.9 3.44 22.7. 43:40 93 AS 1.84* 
Laboratory Technician 245 7.01 Deore 29.8 6.79 .65 ee AB Phoeene 
Physician 28.6 9.05 20.7 5.42 25.8 5.92 Des hae 81 2.01* 
Musician, Teacher 2338.15.11 27.05 -6:94 2319 Be5e75 117, .04 1.16 
Musician, Performer 316) Bere Slele oso 30.2 5.03 30 70 .03 

*p <.10. 
** Dy < 05. 


wk ® <= 05. 
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SVIB-W which have high and low male-asso- 
slation factor loadings and on the remaining 
scales were analyzed using the chi-square 
nethod. In the comparison among the. three 
sroups the chi-square value of 44.30, as shown 
n Table 1, indicated a significant difference in 
‘he number of girls who obtained scores above 
che mean on positively loaded scales, on nega- 
‘ively loaded scales, and on the remaining 
scales of the SVIB. 

Results in Table 1 indicate that there is a 
‘endency for girls who differ in attitudes as 
measured by the CMAT to differ also in 
mterests as measured by the SVIB. Girls who 
2arned scores on the career scale of the CMAI 
one standard deviation or more above the 
mean had more interest scores above the 
mean on artist, author, librarian, lawyer, phy- 
sician, and social worker scales. Girls who 
sarned scores on the marriage scale of the 
CMAT one standard deviation or more above 
the mean had more interest scores above 
the mean on _ housewife, office worker, 
stenographer-secretary, nurse, physical educa- 
tion, and mathematics teacher scales. These 
findings support the findings reported by 
Layton (1958). 

Table 2 summarizes SVIB data. Ten of the 
27 SVIB scales significantly discriminated 
among the groups at the .05 level or higher. 
On three scales, librarian, lawyer, and phy- 
sician, the Career Group had higher mean 
scores. The Mixed Group had higher mean 
scores on the social worker, housewife, and 
elementary teacher. On four scales, home eco- 
nomics teacher, dietitian, physical education 
teacher, and laboratory technician, the Mar- 
riage Group scored higher. 


DISCUSSION 


Within the limitations of the design for this 
study, and in light of the obtained findings, 
the following conclusions appear valid: 
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1. In their vocational development, career- 
oriented, marriage-oriented, and mixed-ori- 
ented sophomore, junior, and senior college 
girls do not differ significantly in respect to 
grade-point averages, key-figure influence, 
emergence and persistence of occupation pref- 
erences, and work experience as found by 
chi-square comparisons of these variables. 

2. In their vocational development, career- 
oriented, marriage-oriented, and mixed-ori- 
ented girls differ in respect to the number of 
girls who obtain scores above the mean on 
the SVIB-W, Form W, on the occupational 
scales with high and low male-association fac- 
tor loadings, and on the remaining scales. 
Girls primarily interested in marriage, or in 
mixing marriage with a career, had interests 
in more occupations that serve as interim 
jobs preceding marriage than did the girls 
primarily interested in a career. 
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62 experienced inspectors inspected 10 different items of electronic equipment 
covering a wide range of complexity. 8 or more inspectors inspected each item. 
Inspection performance was found to have an almost perfect inverse relation- 
ship with equipment complexity. r’ between percentage of defects detected 
and pair comparison ratings of complexity was —.92; r’ between percentage 
of defects detected and number of parts was —.91. The results indicated that 
equipment complexity has a significant detrimental effect on inspection per- 
formance and that this effect cannot be overcome by extending the amount of 


inspection time allotted. 


The optimal organization of quality control 
inspection activities and the development of 
procedures and tools to assure satisfactory 
inspection performance require an under- 
standing of inspector capabilities and of the 
factors which influence inspection perform- 
ance. This study investigated the effect of 
one factor, complexity of electronic equip- 
ment, on inspection performance. As an at- 
tribute of electronic equipment, complexity 
was considered to be primarily a function of 
the number of parts which make up an equip- 
ment item and of the way the parts are inter- 
related or arranged. Complexity was consid- 
ered to increase as the number of parts 
increases and as the arrangement of parts 
becomes less orderly. 

It was hypothesized that inspectors would 
be less effective in detecting defects in more 
complex items than in less complex items— 
the percentage of defects detected would be 
inversely related to the complexity of the 
item inspected—even when inspectors are 
given an unlimited amount of time in which 
to inspect. When more parts are involved and 
their arrangement is less orderly, more dif- 
ferent kinds of decisions are required. The 
inspector, therefore, must be aware of more 
criteria upon which to base his decisions. 
Also, confusion may be more likely to occur 
when he is searching a more complex item 
for defects. 

The objective of this study was to deter- 
mine quantitatively the relationship between 
equipment complexity and inspection effectiv- 
ity. Since one of the goals at any inspection 
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station is the detection of 100% of the defects 
present in each item inspected, the size of the 
relationship would determine the extent to 
which complexity should be minimized or 
compensated for in achieving this inspection 
goal. 


MEeETHOD 


The approach taken was to obtain measures of 
inspection effectiveness for each of 10 different items 
of electronic equipment covering a wide range of 
complexity, to obtain quantitative measures of com- 
plexity for each of these equipment items, and then 
to determine the relationship between the two sets 
of measures. 


Study Participants 


A total of 62 inspectors participated in the study. 
With the exception of a few inspectors who were 
not available because of vacations or sickness, the 62 
inspectors comprised the population of experienced 
inspectors who regularly inspected one or more of 
the 10 equipment items. All participants had at 
least 2 months experience on the job and met mini- 
mum standards of visual acuity—20/30 distant 
vision corrected or uncorrected, 20/20 near vision 
corrected or uncorrected, normal color perception, 
and normal depth perception. 


Inspection Performance 


The measure of inspection performance associated 
with each equipment item was the percentage of 
defects detected in the item by inspectors experi- 
enced in its inspection. A list of the defects actually 
present in each equipment item was determined by 
individual inspections of the item by a panel of at 
least three experts. The panel consisted typically of 
the inspection supervisor, the responsible quality 
control engineer and a senior inspector. The list 
consisted of those defects identified by the panel 
and for which the panel members were in joint 
agreement. 


EQUIPMENT COMPLEXITY AND INSPECTION PERFORMANCE 


At least eight inspectors independently inspected 
sach item. Since the job assignments of some inspec- 
‘ors involved inspecting more than one type of 
tem, 18 inspectors each inspected as many as three 
of the 10 equipment items. The remaining 44 
mspectors inspected only one equipment item each. 
_ Each inspector who participated in the study per- 
‘ormed his inspections independently and with a 
ninimum of distractions. Either a conference room 
or office was provided away from the inspector’s 
normal work station. The inspector was provided 
with all reference materials and tools normally avail- 
ible to him at his regular inspection station. The 
mspector was given as long as he wanted to complete 
ais inspections; to make the task more realistic, 
1owever, he was told that the amount of time he 
-equired would be recorded. All inspectors took much 
onger to perform their inspections than they 
iormally took during routine inspections. 


Equipment Complexity 


Two approaches were taken to the measurement 
yf equipment complexity. One was a _pair-com- 
yarison complexity rating of the 10 items of equip- 
ment by 2 judges—the author and an associate not 
yreviously involved in the study. A complexity index 
‘or each item was computed by multiplying the 
iverage number of selections accorded each item by 
. constant which provided the most complex item 
with an index of 100. The indexes for the 10 items 
‘anged from 6 to 100. The agreement between the 
‘wo judges, as measured by the rank-order correla- 
ion between the two sets of ratings, was .89. 

A second approach to measuring equipment com- 
lexity simply involved counting the number of 
major parts making up each item—circuit boards, 
‘esistors, wire bundles, connectors, transistors, and 
(0 on. The number of parts in the 10 items ranged 
rom 2 to 98. 

The two approaches to measuring complexity pro- 
vided nearly identical results. The rank-order correla- 
ion between the two sets of measures was .97. 
it appeared likely that the raters responded primarily 
o number of parts in making their complexity 
-atings. 


RESULTS AND DISCUSSION 


Inspection performance was found to have 
un almost perfect inverse relationship with 
quipment complexity. The more complex the 
squipment item, the lower the percentage of 
lefects detected. The rank-order correlation 
yetween rated complexity and percentage of 
lefects detected was —.92; the correlation 
yetween number of parts and percentage of 
lefects detected was —.91. Both correlations 
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Fic. 1. Inspection performance as a function of rated 
equipment complexity. 


were statistically significant beyond the .01 
level. The linearity of the relationship be- 
tween rated complexity and inspection per- 
formance is shown by the regression line in 
Figure 1. The regression line for number of 
parts versus inspection performance was very 
similar. 

Since it was not possible to have the same 
number of defects in each item or set of 
items inspected, the possible effect of this 
variable on the study results was investi- 
gated. The rank-order correlation between 
number of defects present and inspection per- 
formance was found to be —.14. The low non- 
significant correlation indicated that number 
of defects had essentially no effect on the 
study results. 

The results indicated that equipment com- 
plexity has a significant detrimental effect 
on inspection performance, an effect that 
cannot be overcome simply by extending 
the amount of inspection time allotted. This 
finding suggests that significant gains in 
inspection performance may be obtained by 
reducing equipment complexity or by de- 
veloping procedures and aids which reduce 
the effect of complexity. 


(Received February 8, 1965) 
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RELATIONSHIP BETWEEN PROBABILITY OF ENDORSEMENT 
AND SOCIAL DESIRABILITY SCALE VALUE FOR A 


SET OF 2,824 PERSONALITY STATEMENTS* 


ALLEN L. EDWARDS 


University of Washington 


A group of 47 male and 48 female judges rated 2,824 personality statements 
for social desirability using a 9-point rating scale. Another group of 110 male 
and 111 female Ss described themselves in terms of the same set of 2,824 
statements by answering each one “true” or “false.” The correlation between 
probability of a “true” response and social desirability scale value for the 
combined sex groups was .892. The distribution of the social desirability scale 
values of the 2,824 statements was distinctly bimodal. These results are in 
accord with another large-scale study in which 1,647 personality statements 
were investigated. In view of the large number of personality statements 
involved in these 2 studies, it is suggested that a correlation of .90 between 
probability of endorsement and social desirability scale value and a bimodal 
distribution of the scale values of personality statements may be characteristic 


of the population. 


Edwards (1953) originally reported a cor- 
relation of .87 between probability of endorse- 
ment, P(T), and social desirability scale value 
(SDSV) for a set of 140 personality state- 
ments. This finding has been confirmed by a 
number of other studies (Cowen & Tongas, 
1959; Cruse, 1963; Edwards, 1957; Edwards 
& Walsh, 1963; Hanley, 1956; Taylor, 1959) 
in which different sets of items were used. In 
each of these studies, however, the number of 
items involved was under 200. 

The only large-scale study is one by Cruse 
(1965). He investigated the relationship be- 
tween P(T) and SDSV for a set of 1,647 
personality statements and found a correla- 
tion of .90 between these two variables. In ad- 
dition, Cruse found that the distribution of 
SDSVs of the 1,647 statements was distinctly 
bimodal in shape. 

The present study reports upon the rela- 
tionship between P(T) and SDSV for another 
independently constructed and still larger set 
of 2,824 personality statements and upon 
the distribution of the SDSVs of these 2,824 
statements. 


METHOD 


Ratings of social desirability on a 9-point rating 
scale and probabilities of endorsement for each of 


1 This research was supported in part by Research 
Grant MH-04075 from the National Institute of 
Mental Health, United States Public Health Service. 
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2,824 experimental personality statements were avail- 
able from another study (Edwards & Walsh, 1963). 
The 2,824 items were rated by 47 males and 48 fe- 
male paid judges. Another independent group of 110 
male and 111 female paid subjects (Ss) responded 
“true” or “false” to each of the items in self-descrip- 
tion. P(T) and SDSV for each of the 2,824 state- 
ments were obtained separately for each sex group 
and for the combined group of 221 Ss and 95 judges. 


RESULTS AND DISCUSSION 


Table 1 shows the intercorrelations be- 
tween P(T) and SDSV for male and female 
Ss and male and female judges.2 When the 
male ratings of social desirability are com- 
bined with the female ratings the mean SDSV 
is 5.110 and the standard deviation of the 
SDSVs is 1.719. Similarly, when the male 
P(T)s are combined with the female P(T)s 
the mean P(T) is .469, and the standard 
deviation is .304. The correlation between 
P(T) and SDSV for the combined sex groups 
is .892, a value that is quite close to the one 
reported by Cruse. 

Figure 1 shows the distribution of the 
SDSVs of the 2,824 statements. It is appar- 
ent that this distribution is bimodal in shape 
as was also the distribution of SDSVs of the 
1,647 statements investigated by Cruse. 

The fact that the correlations obtained in 
two independent studies involving two inde- 


? Alan J. Klockars wrote the programs and super- 
vised the computer runs for these intercorrelations. 
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TABLE 1 


CORRELATIONS BETWEEN SDSVs AnpD P(T)s oF 2,824 PERSONALITY STATEMENTS FOR MALE AND FEMALE 
JUDGES AND MALE AND FEMALE SUBJECTS 





SDSV 


Male Female 





SDSV 
Male oo .986 
Females mG 


P(T) 
Males 
Females 








P(T) 
Male Female x 5 
883 884 5.148 1.612 
.868 883 5.072 1.836 
— .960 480 300 
— 458 ls 





pendent and quite large sets of personality 
statements are almost identical suggests that 
a correlation of .90 between P(T) and SDSV 
might reasonably be regarded as being very 
close to the population value. In fact, if some- 
one should obtain a correlation between P(T) 
and SDSV for a selected set of statements 
which differs considerably from .90, a reason- 
able conclusion would be that the selected set 
of statements is not representative of the pop- 
ulation. 

Furthermore, the fact that the SDSVs of 
the 1,647 statements investigated by Cruse 
and the SDSVs of the 2,824 statements in- 
vestigated in the present study have a bi- 
modal distribution suggests that this kind of 
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Fic. 1. Distribution of social desirability scale values 
of 2,824 personality statements. 


distribution may be characteristic of the pop- 
ulation. Personality statements with neutral 
SDSVs do occur but their relative frequency 
is considerably less than that for statements 
with socially desirable or socially undesirable 
scale values. 
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EFFECTS OF ANTICIPATORY ALERTING SIGNALS AND 


A COMPATIBLE SECONDARY TASK ON 
VIGILANCE PERFORMANCE 


RUSSELL L. SMITH, LUIGI F. LUCACCINI, 
HILDE GROTH, anp JOHN LYMAN! 


University of California, Los Angeles 


This visual vigilance study simulated an industrial inspection task in which 
Ss were alerted to possible targets by a semiautomatic detection device. 1 ex- 
perimental group was forewarned of possible targets by a buzzer with 1-sec 
foreperiod and rested between alerting signals. A 2nd experimental group 
worked on a problem-solving secondary task instead of resting between buzzes. 
A control group observed the display continuously. Other variables of interest 
were sex of observer, target type, and size of display window. It was found 
that: (a) performance by alerted groups was far superior to that of controls 
and continued to improve throughout the task; (b) a vigilance decrement was 
not in evidence in any condition; (c) the problem-solving task did not interfere 
with detection performance; (d) male and female Ss performed equally well; 
(e) Ss engaged in the problem-solving task greatly underestimated the duration 
of the detection task and reported it “interesting” while the other groups 
estimated duration accurately and indicated boredom. 


In the typical vigilance experiment targets 
presented for detection are well above thresh- 
old and easily detected by fresh observers. In 
the majority of studies it was found that as 
the task continues the probability of detection 
declines rapidly and remains lower for the 
remainder of the vigil. Several qualitative 
theories have been proposed to account for 
this classical vigilance decrement but none 
explains more than a fraction of the experi- 
mental findings. Reviews of these theories 
have recently been made by Bergum and 
Klein (1961), Jerison and Pickett (1963), 
McGrath, Harabedian, and Buckner (1959), 
and Frankmann and Adams (1962). 

In their review Bergum and Klein (1961, 
p. 39) present a number of empirical sug- 
gestions based on established human factors 
principles for increasing the effectiveness of 
manmonitored systems. The purpose of the 
present study is to evaluate the utility of one 
of their suggestions, the use of anticipatory 
alerting signals. Alerting signals per se are 
not a new concept. Previously, however, they 
have been used to manipulate the observer’s 
“level of arousal” or to provide feedback on 
past signals (Pollack & Knaff, 1958; Travis 


1 The authors wish to express their thanks to Ed- 
ward C. Carterette for his advice and assistance. 


& Kennedy, 1947). In the present study 
alerting signals were presented immediately 
prior to the appearance of targets with the 
specific purpose of reducing the temporal un- 
certainty of the critical event as much as 
possible. This technique assumes the existence 
of an automatic first-stage detection device 
able to filter the train of task stimuli and 
alert the second-stage monitor, the operator, 
to likely targets. Between warning signals the 
operator would then be free to ignore the 
vigilance task and either rest or perform other 
activities. Theoretically, performance would be 
optimum in such a situation, the performance 
level depending on the difficulty of the final 
detection task. Such a situation is certainly 
feasible in light of present technology. A 
search of the literature revealed only one 
study which had direct bearing on this use 
of alerting signals, Wilkinson (1961) used 
auditory alerting signals (a loud buzz) to 
signal the start of each trial in a visual detec- 
tion task. The Ss were aware that if a target 
were presented it would appear 1 second after 
the buzzer sounded. However, trials and thus 
alerting signals occurred regularly and closely 
spaced in time. In the course of an hour 800 
signals were given; these were followed by 
a target in only 32 cases. This target rate 
is not particularly low in itself, but the pre- 


240 


VIGILANCE PERFORMANCE 


dictive power of the alerting signal is ex- 
tremely low (i.e., probability of a target 
following a buzz = .04). Colquhoun (1961) 
has shown that detection efficiency may be 
related to the probability that any signal will 
be a wanted one much more strongly than 
to the rate of target presentation per se. In 
this light it is not surprising that Wilkinson 
(1961) found no difference between the per- 
formance of paced (alerted) Ss and control 
(unalerted) Ss. The primary interest of this 
study was to evaluate the utility of an alert- 
ing signal with fairly high predictive power 
for increasing performance. It was hypothe- 
sized that in such a situation alerted Ss would 
perform significantly better than unalerted, 
continuously observing control Ss. 

It is generally agreed that besides the 
ability to maintain a minimal level of atten- 
tion the typical vigilance experiment requires 
negligible use of higher mental processes and 
vigilance tasks are frequently considered 
boring. Therefore, the second objective of the 
study was to determine the effects of a sec- 
ondary task requiring a high degree of mental 
effort on vigilance performance and on task 
interest. A nonvigilance secondary task was 
used in contrast to most studies which have 
used two vigilance tasks. It was hypothesized 
that such a secondary task would raise overall 
task interest while not detracting from the 
effect of the alerting signal on the primary 
task. The secondary task was chosen spe- 
cifically to comply with Bakan’s (1959) defi- 
nition of compatible secondary tasks. These 
are tasks which alternate in time with the 
primary task and are presented through the 
same sense modality. 

The third objective was the verification of 
Whittenburg, Ross, and Andrews’ (1956) find- 
ing that females perform better than males 
on vigilance tasks. 


METHOD 
Apparatus 


The vigilance task described below was chosen to 
satisfy two specific requirements: (a) to present Ss 
with a situation difficult enough to reveal possible 
differences between experimental treatments, and (bd) 
to simulate the search and discrimination functions 
often required in military and industrial tasks. 

Many industrial inspection tasks require a human 
“monitor to search a constant flow of products or 
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Fic. 1. Four-second sequence of the display tape 
showing two window sizes (dashed lines), a target 
(T), a false target (FT), and a visual noise back- 
ground. 


materials for defective items. A visual display was 
constructed to simulate the dynamic but relatively 
monotonous aspects of such tasks. It consisted of a 
small window behind which a paper tape moved 
from left to right. The window was located at eye 
level in the center of a U-shaped isolated booth. The 
Ss sat at a desk in the center of the booth monitor- 
ing a continuous series of objects on the tape. Three 
classes of objects appeared on the tape: targets, 
false targets, and a random-appearing visual noise 
background. Both targets and false targets were 
small squares differing only in the position of a small 
extension on one side. Objects in the visual noise 
background differed from targets and false targets 
in the number of “arms” protruding from them. 
A sample of the tape is presented in Figure 1 show- 
ing all objects in the actual size used. The noise 
background appeared continuously on the tape and 
was produced from a set of four rubber stamps. 
The patterns were constructed randomly subject to 
restrictions on number and proximity of objects, 
By rotating each of the stamps a total of 16 different 
patterns was obtained. A random sequence of 900 of 
these patterns was stamped on the tape. The tape 
ran for 60 minutes at a constant speed of .5 
inch/second. Two display window sizes were used, 
large (LW, 2 inches X 1 inch) and small (SW, 2 
inches X 0.5 inch). These are shown in Figure 1. A 
fine wire divided these windows lengthwise into 
upper and lower halves. Thus at the paper speed 
used objects were visible for 1 or 2 seconds in SW 
or LW, respectively. 

For purposes of target presentation the tape was 
subdivided into six successive 10-minute intervals. 
Four targets were randomly distributed over each 
10-minute interval. Of these, two appeared alone 
against the noise background and two were accompa- 
nied by a false target that appeared simultaneously 
in the opposite half of the display. In addition two 
other false targets appeared randomly during the 
interval. Thus 24 targets and 24 false targets were 
presented, 12 of each appearing alone and 12 appear- 
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ing at the same time. Targets were equally dis- 
tributed vertically across the display; half appeared 
near the central dividing wire and half appeared 
peripherally. The same tape was used throughout the 
experiment for all Ss. An exact replica of the target 
was placed above the display window for immediate 
reference at all times. 

Presentation of targets and false targets was 
recorded automatically on a four channel event 
recorder by a photoelectronic relay arrangement. The 
relay was used for triggering the alerting buzzer as 
well. The S response buttons were connected to the 
other channels of the event recorder. 


Procedure 


A five-way factorial design was employed to test 
the following independent variables: three observa- 
tion condtions, two display sizes, two sexes of ob- 
server, two target types, and three time intervals. 
Independent groups of four Ss served in each of the 
12 combinations of the first three variables. The 
last two variables were repeated measures conducted 
on all Ss. Two dependent variables were obtained 
from each S’s data record: (a) number of correct 
target detections, and (b) false alarms consisting of 
responses to false targets or other stimuli. 

The two display sizes corresponded to the window 
sizes. Within each condition of observation eight Ss 
viewed the LW display and eight viewed the SW 
display. Each subgroup was composed of four males 
and four females in order to assess the effects of the 
third variable, sex of observer. Target type referred 
to the two conditions of target presentation. Intervals 
represented the division of the task into three con- 
secutive 20-minute periods to assess sequential effects. 

Three levels of observation condition were used 
with 16 Ss serving in each: (a) Control Condition 
(Condition C): Ss were instructed to observe the 
display continuously and to respond to targets by 
pressing one button as soon as they appeared. Two 
response buttons were provided, corresponding to 
upper and lower halves of the display. Data from 
this group represented the control condition. (b) 
Buzzer Condition (Condition B): Ss were informed 
that they need only observe the display when a 
buzzer sounded. They were requested to remain 
awake and in a position to see the display at all 
times. They responded to targets as in Condition C. 
The buzzer sounded 36 times during the hour, once 
for each of the 36 target and false target stimulus 
combinations on the display tape. Each buzz sounded 
automatically 1 second before the stimulus came into 
view. (c) Anagram Condition (Condition A): Ss 
were given the same instructions regarding the de- 
tection task as in Condition B. In addition they were 
given a secondary task to perform between sounds 
of the buzzer. The task consisted of completing a 
set of single-solution anagrams. The Ss were told to 
skip any anagram that stumped them and to press a 
third response button after they completed or skipped 
each anagram. 

At the beginning of the experiment instructions 
were read to each S explaining the nature of the 
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task. A 2-minute practice session was conducted in) 
which S was informed of his success after each tar- 
get or false target appeared. Additional instructions: 
regarding the buzzer or secondary task were then| 
given to Ss in Conditions A and B. After all ques-- 
tions were answered S removed his watch and be-- 
gan the detection task. The experiment continued | 
uninterrupted for 1 hour during which time £ re-. 
mained in the room behind the display. No com-. 
munication was permitted during the hour. At the: 
end a questionnaire was administered on which S 
supplied personal history information, estimated the 
duration of the detection task in minutes, and rated 
primary and secondary tasks on a 5-point scale rang- 
ing from “very boring” to “very interesting.” 


Subjects 


The Ss were 24 male and 24 female undergradu- 
ates from introductory psychology courses at the 
University of California at Los Angeles who volun- 
teered for an experiment on “signal detection.” Each 
S served individually in a single test session lasting 
90 minutes. Each received credit toward a course 
requirement for participating. Assignment of Ss to 
conditions within the experiment was made randomly 
for each sex. 


RESULTS 


The number of targets detected during the 
test session was summed for each S.? An anal- 
ysis of variance was performed on these scores. 
It was found that Observing Condition (F = 
44.63, df = 2/36), Display Size (F = 8.55, 
df = 1/36), Target Type (F = 49.19, df= 
1/36) and Intervals (F = 14.41, df = 2/72) 
were significant main effects while Sex was 
not. Of the first-order interactions Target 
Type X Display Size (F = 7.35, df = 1/36), 
Intervals X Observing Condition (F = 3.76, 
dj = 4/72) and Target Type X Intervals (F 
= 9.01, df =2/72) were significant effects. 
The other first-order interactions and all 
higher-order interactions with one exception 
were not significant sources of variation. 

Figure 2 presents performance as a func- 
tion of observing condition and interval. In- 


2 Raw scores by Ss consisting of target-by-target 
detections, overall false alarm rates by S and overall 
target detections by Ss according to the five major 
variables of interest have been deposited with the 
American Documentation Institute. Order Document 
No. 8733 from ADI Auxiliary Publications Project, 
Photoduplication Service, Library of Congress, Wash- 
ington, D. C. 20540. Remit in advance $1.75 for 
microfilm or $2.50 for photocopies and make checks 
payable to: Chief, Photoduplication Service, Library 
of Congress. 
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TIME IN 20 MINUTE INTERVALS 


Fic. 2. Group mean percentage detections as a function of time interval. 


ection of this figure shows that mean per- 
mance was much higher in the two experi- 
ental conditions (A and B) than in the con- 
ol condition. A Newman-Keuls test (Winer, 
962) revealed that at all intervals the dif- 
rence between the experimental conditions 
nd the control condition was significant (p 
-.01). No significant differences were found 
| mean performance between the two experi- 
ental conditions at any one interval, nor 
id mean performance in Condition C vary 
gnificantly from interval to interval. The 
iterval-to-interval improvement shown by 
lerted Ss was significant (p < .01) from first 
» second interval and from second to third 
iterval in Condition A and from first to sec- 
nd interval in Condition B. Wilkinson’s 
1961) results for alerted and unalerted 
roups by 15-minute intervals are included in 
igure 2 for comparison, 

Figure 3 presents performance as a func- 
on of target type and interval. Inspection of 
1e figure shows that the interval-to-interval 
nprovement exhibited by alerted Ss occurred 
iainly for the more difficult target type, those 
resented together with a false target. A New- 
ian-Keuls test revealed that the interval-to- 
iterval improvement in detection of targets 


plus false targets was significant (p < .01). 
Mean performance on targets presented alone 
did not differ from interval to interval but was 
superior (p < .01) to performance on targets 
plus false targets for all but the final interval. 

Figure 4 presents the interaction of target 
type and display size. It can be seen from 
this figure that of the two target types only 
performance on targets plus false targets was 
benefited by an increase in display size or 
stimulus exposure time. A Newman-Keuls 
test showed that of the two target types, tar- 
gets presented alone were detected at a sig- 
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Fic. 3. Mean percentage detections as a function of 
time interval for the two target types. 
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Fic. 4. Mean percentage detections as a function of 
display window size for the two target types. 


nificantly higher rate (p < .01) than targets 
plus false targets regardless of display window 
size. Targets presented alone were detected 
equally well in either display window while 
targets plus false targets were detected at a 
significantly higher rate (p< .01) in the 
larger display. 

Of the 821 targets detected by all Ss 427 
were located in central display areas while 
394 were situated peripherally. This differ- 
ence was not significant (x? = 1.33, p < .30). 
A separate analysis for Ss in Condition C, 
those most likely to have developed a con- 
sistent search pattern, again showed a some- 
what higher detection rate for centrally lo- 
cated targets, 105 versus 88, but this differ- 
ence lacked significance (x? = 1.17, p < .30). 

The rate of false alarms is presented in Ta- 
ble 1 as a function of observing condition and 
display size. A two-way analysis of variance 
was performed on the number of false alarms 
by S. Both Observing Condition, F (2, 42) = 
10.92, p < .01 and Display Size, F (1, 42) = 
4.47, p < .05, were significant sources of vari- 
ation while their interaction was not. A New- 
man-Keuls test revealed that the rates for 
Conditions A and B did not differ significantly 
from each other but were far superior to per- 
formance in Condition C (p < .01). It is ap- 
parent from the table that the reduced view- 
ing time of the SW display resulted in a 
greatly increased rate of false alarms for all 
observing conditions. 

The number of anagrams attempted and 
completed was summed for Ss of Condition A 
under each of the viewing windows to obtain 
a measure of work performance on the sec- 
ondary task. The Ss viewing through the large 
display window attempted. and completed 
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more anagrams than those viewing throug 
the small window (551 and 343 versus 47 
and 296, respectively). A test on the numbe 
attempted indicated that this difference we 
significant (x? = 5.33, p < .05). The ratio ¢ 
anagrams completed to those attempted we 
nearly identical for both viewing conditior 
(62.3% versus 62.1%). 

The two measures of vigilance task interes 
derived from the postexperimental questior 
naires were in good agreement. The Ss ¢ 
Condition A rated the detection task as “ir 
teresting” and greatly underestimated th 
total time spent (X = 37 minutes) while S 
of Conditions B and C rated the task as “‘bo1 
ing” and estimated task duration fairly ac 
curately (X = 52 and 58 minutes, respec 
tively). Analysis of postexperimental que: 
tionnaires failed to reveal significant relatior 
ships between performance on the vigilanc 
task and Ss’ college grades, career preference: 
or their own estimates of reading speed, ag 
gressiveness, and introversion-extraversion. 


DISCUSSION 


The results of this experiment support th 
hypothesis that a meaningful anticipator 
alerting signal can improve vigilance perform 
ance significantly. Alerted Ss not only pet 
formed initially at a far superior level to ur 
alerted controls but continued to improve thei 
performance with time. The improvement wa 
found to have occurred largely on the mot 
difficult detection stimuli and most likely rer 
resents the effects of increasing familiarit 
with the discrimination task (cf. Figure 3). 

Performance scores for the three observin 
conditions, although varying in absolute val 
ues, maintained the same relative order fo 
all four combinations of task difficulty (Tat 
get Type X Display Window Size). 


TABLE i 
Fatsr ALARMS FOR EACH OBSERVING CONDITION 








Observing 

condition LW display SW display Average 
A 2S 6.25 4.50 
B 28 4.88 3.81 
Cc 11.62 22.62 t 7a 





VIGILANCE PERFORMANCE 


The superiority of performance in the 
lerted conditions is consistent with related 
esearch where temporal signal uncertainty 
jas varied (Adams & Boulter, 1964; Baker & 
larabedian, 1962). The absence of perform- 
nce decrements in all observing conditions 
ands further credence to the suggestion that 
he search factor required by tasks high in 
patial uncertainty is responsible for the lack 
f a decrement found in field situations (Jeri- 
on & Pickett, 1963). Not only were no over- 
ll decrements observed but also target-by- 
arget inspection of data within the three ob- 
erving conditions failed to indicate the oc- 
urrence of the rapid initial decrement that 
as been reported by others (cf. Jerison & 
ickett, 1963, p. 227). 

The results were consistent with the hy- 
othesis that a “compatible” nonvigilance sec- 
ndary task could be used to raise task in- 
erest without reducing vigilance performance. 
Ithough some Ss in Condition A reported 
ifficulty in switching between primary and 
secondary tasks, detection scores for this group 
id not differ from those of alerted Ss without 
he secondary task in Condition B. When dis- 
lay size and thus viewing time was reduced, 
fork output on the secondary task was re- 
uced significantly. Variation in display size 
volved negligible changes in the length of 
he primary task, therefore the reduced out- 
ut cannot be attributed to a reduction in the 
ime available for the secondary task. The 
eduction may possibly be due to increased 
elf-motivation of Ss viewing with the SW 
isplay. These Ss became aware of the short 
lewing time and possibly sacrificed the sec- 
ndary task somewhat in order to maintain 
eadiness for the primary task. 

The generality of Whittenburg, Ross, and 
indrews’ (1956) finding that females per- 
ormed better than males in vigilance tasks 
yas not extended by our results. Although 
heir finding referred only to the last half of a 
-hour task, differences were apparent in their 
roups from the start. Our results for a 1-hour 
ask showed no trends attributable to sex of 
bserver. Differences in selection of Ss or in 
s’ motivation or self-instruction based on 
nowledge of task duration are possible expla- 
ations of the discrepancy between studies. 
lore likely this difference may be attributed 
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to the nature of the tasks used. A further ex- 
periment recently completed in this laboratory 
on a task identical to the present one has also 
failed to reveal a significant effect due to sex 
of observer. Unfortunately the few other 
studies that have included females fail to re- 
port the effects of sex of observer (Bakan, 
1955; Ross, Dardano, & Hackman, 1959). It 
seems most likely that possible sex differences 
are task specific. Clearly, this variable merits 
further attention. 

Although it has frequently been shown that 
targets are more easily detected in central 
display areas than at the peripheries (see 
Baker & Harabedian, 1962, p. 30 for a review) 
detection rates did not differ significantly be- 
tween peripheral and central display areas in 
this study. Either the typical radar operator 
sweep pattern favoring mid-range or central 
display areas was not induced by the present 
task or the display areas used were too small 
to reveal effects due to search pattern. 

One further point deserves comment. While 
it is reasonably certain that the major effects 
of the alerting signal are adequately explained 
in terms of momentary arousal and elimina- 
tion of temporal signal uncertainty a con- 
founding variable should be mentioned. 
Alerted Ss were able to rest or work on a 
secondary task for at least 90% of the time. 
It has been shown repeatedly that the intro- 
duction of rest periods can lead to significantly 
higher performance than is attainable under 
conditions of continuous observation (e.g., 
Jenkins, 1958). In the short duration task, 
rest periods are considered to relieve boredom 
rather than physiological impairment; task 
interest ratings and time estimates for Condi- 
tion B did not differ from those for Condition 
C. Apparently the large amount of unoccupied 
rest time was boring in itself for these Ss; 
however, their vigilance performance was simi- 
lar to Ss of Condition A. These did not indi- 
cate boredom. While the influence of rest 
periods cannot be determined merely on the 
basis of interest scores and time estimates, 
these indices suggest that rest period effects 
played a minor role in the present study. 


REFERENCES 


Apams, J. A., & Bourter, L. S. Spatial and temporal 
uncertainty as determinants of vigilance behavior. 


246 


Journal of Experimental Psychology, 1964, 67, 127- 
131. 

BaKan, P. Discrimination decrement as a function of 
time in a prolonged vigil. Journal of Experimental 
Psychology, 1955, 50, 387-390. 

Baxan, P. Extraversion-introversion and improve- 
ment in an auditory vigilance task. British Journal 
of Psychology, 1959, 50, 325-332. 

Baxer, C. H., & Harasepran, A. A study of target 
detection by sonar operators. Human Factors 
Problems in ASW Tech. Rep., Human Factors Re- 
search, Inc., 1962, No. 206-16. 

Brrcum, B. O., & Krety, I. C. A survey and analy- 
sis of vigilance research. Human Resources Re- 
search Office Res. Rep., 1961, No. 8. 

CotquHoun, W. P. The effect of “unwanted” sig- 
nals on performance in a vigilance task. Ergonom- 
ics, 1961, 4, 41-51. 

FRANKMANN, J. P., & Apams, J. A. Theories of 
vigilance. Psychological Bulletin, 1962, 59, 257-273. 

Jenxins, H. M. The effects of signal rate on per- 
formance in visual monitoring. American Journal 
of Psychology, 1958, 71, 647-661. 

Jerison, H. J., & Picxett, R. M. Vigilance: A re- 
view and reevaluation. Human Factors, 1963, 5, 
211-238. 


R. L. Smiru, L. F. Lucaccini1, H. GrotH, AnD J. LYMAN 


McGratu, J. A., Harasepian, A., & Buckner, D. N 
Review and critique of the literature on vigilance 
performance. Human Factors Problems in ASW 
Tech. Rep., Human Factors Research, Inc., 1959 
INOmals 

Portack, I., & Kwarr, R. P. Maintenance of alertness 
by a loud auditory signal. Journal of the Acousti- 
cal Society of America, 1958, 30, 1013-1016. 

Ross, S., Darpano, J.. & Hackman, R. C. Conduc- 
tance levels during vigilance task performance 
Journal of Applied Psychology, 1959, 43, 65-69. 

Travis, R. C., & Kennepy, J. L. Prediction and con- 
trol of alertness: I. Control of lookout alertness 
Journal of Comparative and Physiological Psy- 
chology, 1947, 40, 457-461. 

WHITTENBURG, J. A., Ross, S., & Anprews, T. G 
Sustained perceptual efficiency as measured by the 
Mackworth “clock” test. Perceptual & Motor 
Skills, 1956, 6, 109-116. 

Witxinson, R. T. Comparison of paced, unpaced, 
irregular and continuous display in watchkeeping 
Ergonomics, 1961, 4, 259-267. 

Winer, B. J. Statistical principles in experimental 
design. New York: McGraw-Hill, 1962. 


(Received April 14, 1965) 


Manuscripts Accepted for Publication in the 


Journal of Applied Psychology 


Organizational Conditions and Behavior in 234 Industrial Manufacturing Organizations: George H. Dunteman* 
College of Health Related Professions, University of Florida-Gainesville, Gainesville, Florida. 

Effects of Tuition Payment and Involvement on Benefit from a Management Development Program: L. W. Gruen- 
feld*: N. Y. S. School of Industrial and Labor Relations, Cornell University, Ithaca, New York. 

The Miller Analogies Test: A Note on Permissive Retesting: Robert G. Lane*, Nolan E. Penn, and Robert F. 
Fischer: The University of Wisconsin, Student Counseling Center, 736 University Avenue, Madison, Wis- 


consin 53715. 


A Closer Look at Level of Aspiration as a Training Procedure: A Reanalysis of Fryer’s Data: Edwin A. Locke*: 
American Institutes for Research, 8555 Sixteenth Street, Silver Spring, Maryland 20910. 

The Relationship of Various College Graduate Characteristics to Recruiting Decisions: Stephen J. Carroll, Jr.*: 
Department of Business Administration, University of Maryland, College Park, Maryland 20742. 

Work Group versus Individual Differences in Attitude: Thomas H. Jerdee*: University of Minnesota, Industrial 


Relations Center, Minneapolis, Minnesota 55455. 


The Self-Esteem Variable in Vocational Choice: Abraham K. Korman*: Department of Psychology, New York 
University, 21 Washington Place, New York, New York 10003. 

Some Characteristics of Effective Interviewers: Stanley W. Steinkamp*: Department of Economics, 330 Com- 
merce West, University of Illinois, Urbana, Illinois 61803. 

Failure to Improve Readability with a Vertical Typography: E. B. Coleman*: Department of Psychology, Texas 


Western College, El Paso, Texas. 


Labor Turnover as a Function of Worker Differences, Work Environment, and Authoritarianism of Foremen: 
Ronald Ley*: Department of Psychology, New School for Social Research, 66 West 12th Street, New York 


11, New York. 


* Asterisk indicates author for whom the address is supplied. 


ournal of Applied Psychology 
966, ve sf No. 3, 247-249 


NEED SATISFACTIONS OF MANAGERIAL LEVEL 
PERSONNEL IN A GOVERNMENT AGENCY 


FRANK T. PAINE, STEPHEN J. CARROLL, JR., ano BURT A. LEETE 


Depariment of Business Administration, University of Maryland 


This study compares the need satisfactions of managers in field work with 
the need satisfactions of similar managers in central office work with a govern- 
ment agency. There was greater satisfaction among those in field work especially 
with respect to certain higher-level needs. Also, a comparison was made of 
the need satisfactions of all respondents in the government agency with those 
of a similar group from private industry. The satisfaction of the government 
managers was less across all need items than the satisfaction of the private 
industry managers. A perceptual halo effect created by the insecure conditions 
existing in the government agency at the time the study was conducted may 


explain this finding. 


A number of recent studies have focused 
n differences among managers with respect 
o need satisfactions. For example, it was 
ound that higher-level managers have more 
eed satisfaction than lower-level managers 
Porter, 1962), that line managers experience 
a0re need satisfaction than staff managers 
Porter, 1963a), and that high-level managers 
n large organizations have more need satis- 
action than high-level managers in small 
tganizations but the reverse is true for lower- 
evel managers (Porter, 1963b), and “tall” 
ersus “flat? organizational structures have 
_ differential effect on the need satisfactions 
f managers depending upon the specific type 
f need studied (Porter, 1964). These differ- 
mces seem to be a reflection of different 
esponsibilities, duties, authority, and pres- 
ures associated with management level dif- 
erences, staff-line differences, organizational 
ize differences, and type of organizational 
tructure differences. 

Another possible source of differences in 
janagerial need satisfactions may lie in field 
ersus central office work. Many individual 
vembers of the management team work 
ather independently in communities away 
rom their home offices. This field work is 
haracteristic of certain government agencies 
s well as the marketing departments of busi- 
ess firms. Information on the relationship 
etween this working situation (lack of direct 
upervision, self-scheduling, predominance of 
onorganizational contacts) and the need 
atisfactions of managers may be of value to 
rganizations selecting individuals for such 
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positions. The present study compares the 
need satisfactions of upper-middle-level man- 
agers in field jobs with those in central office 
jobs in a government agency. 

Since the group studied consisted of man- 
agerial level personnel in a government agency 
another type of comparison was possible. The 
study compared the need satisfactions of man- 
agers in the government agency with those of 
similar managers in private industry using 
Porter’s (1962) data. 


METHOD 
Sample 


The data for this study was obtained by adminis- 
tering Porter’s (1961) questionnaire by mail to 71 
field managers and 102 central office managers in the 
spring of 1964. Replies were received from 33 (47%) 
field managers and from 62 (61%) central office 
managers. The questionnaire items are included in 
Table 1 with the comparative team scores. 

The government managers studied were members 
of a new government agency. The field-work man- 
agers work independently in various communities 
and usually contact the central office only by tele- 
phone or letter. They interact primarily with non- 
government personnel in the various communities 
in which they work. The central office managers 
were selected to produce a group matched on the 
average to the field managers with respect to salary 
group status, education, age, and length of service, 
but different with respect to work relationships. 

The private industry managers consisted of 659 
upper-middle managers from Porter (1962). The 
government and industry managerial groups were 
similar with respect to age as well as job level. 


Scoring the Questionnaire 


The amount of need satisfaction experienced by 
each government respondent for each of the 13 
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TABLE 1 


AVERAGE NEED SATISFACTION SCORES OF MANAGERS STUDIED 


Field versus office managers in agency 


Government versus 
industry managers 





Mean score 
Mean score Mean score Mean score __ private 
field-work — office-work government industry 
managers managers Level of managers managers? 
Need categories and items (a, b,c, etc.) (N = 33) (N = 62) significance® (N =195) GV 2059) 
I. Security Needs 
a. (security in job) 2.36 1.73 ns 1.95 40 
II. Social Needs 
a. (opportunity to help 97 1.27 ns Tei zs) 
people) 
b. (opportunity for 85 42 ns SO eZ 
friendships) 
III. Esteem Needs 
a. (feeling of self-esteem) 94. 1.74 02 1.46 .86 
b. (prestige inside agency) 1.06 a ns 1.11 .69 
c. (prestige outside 1.00 1.07 ns 1.05 AL 
agency) 
IV. Autonomy Needs 
a. (authority in position) 1.73 1.34 ns 1.53 90 
b. (opportunity for inde- 97 1.64 .02 1.41 72 
pendent thought and 
action) 
c. (opportunity to par- 1.97 2.00 ns 1.99 1.15 
ticipate in setting 
goals) 
d. (opportunity to par- 2.00) & 1.66 ns 1.78 re 
ticipate in deter- Ls 
mining methods) AW ses, 
V. Self-Actualization Needs ‘4 Me ' ¢ 
a. (opportunity for growth 1 2.26 02 2.02 1.07 
and development) 
b. (feeling of self-ful- 1.24 2.19 02 1.86 1.11 
fillment) 
c. (feeling of accom- 1.53 1.79 ns 1.70 1.18 
plishment) 





Note.—The larger the number the less the need satisfaction. 


a “ns'’ means not significant at .05 level of significance as determined by Mann-Whitney U test. 


b Data from Porter (1962). 


items was determined by subtracting his response to 
part “a” of the item (How much is there now?) 
from his response to part “b” of the item (How 
much should there be?). 

Individual scores were then averaged so that a 
comparison of the mean need satisfaction for each 
of the two groups on each item could be made. The 
Mann-Whitney U test (a useful alternative to the 
t test to avoid assumptions associated with para- 
metric statistical techniques) was used to test for 
statistically significant differences (Siegel, 1956). 


RESULTS 


Table 1 compares. the need satisfactions of 
the two groups of government managers indi- 
cating that field managers felt significantly 
more satisfied than central office managers 
with respect to their needs for self-esteem, 
independent thought and action, growth and 
development, and sense of self-fulfillment. 
Thus, in this agency, field work was some- 
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what more satisfying than central office 
work, especially in the self-actualization need 
category. 

Table 1 also indicates that the government 
managers have considerably less need satis- 
faction with respect to every item than similar 
managers from private industry. It was not 
possible to test for significant differences 
since Porter’s table did not contain suitable 
variability figures. However, the trend ap- 
pears to be highly significant. Differences are 
especially pronounced for the security needs. 
In Porter’s study the security needs of his 
managerial personnel were much more satis- 
fied in both a relative and absolute sense than 
were the security needs of these government 
managers. 


DISCUSSION 


This study indicated that, within this gov- 
ernmental agency, managerial field work posi- 
tions provided significantly more need satis- 
faction than central office jobs for the higher 
level needs. This might be expected since such 
jobs seem to be much less structured than 
the central office jobs. This finding would 
seem to lend some support to the idea that 
bureaucratic characteristics are frustrating 
to the higher level needs. It is interesting to 
note that the social needs of managers in field 
work were not significantly less satisfied than 
the social needs of the central office positions 
in spite of the fact that the field managers 
had little opportunity to interact with other 
agency personnel. Apparently nonagency 
contacts were sufficient to satisfy these social 
needs. 

The study also indicated that government 
agency managers had much less need satis- 
faction than private industry managers simi- 
lar to them in age and organizational level. 
It is possible that government “norms” with 
respect to need satisfactions are quite dif- 
ferent from such norms for managers in pri- 
vate industry. Studies of other government 
agencies would be needed to confirm this. 
However, a more likely explanation is in the 
basic insecurity felt by all personnel in this 
particular government agency. The govern- 
ment managers perceived more deficiency in 
security needs than in any other need cate- 
gory while the private industry managers 
were relatively satisfied with their security. 
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This difference can be accounted for by the 
fact that this agency’s future was in some 
doubt at the time the study was made. Cer- 
tain of its funds had been depleted and the 
agency was attacked by certain elements in 
Congress and in the press. Several partici- 
pants in the study actually mentioned their 
concern about the agency’s future in the 
questionnaires they returned. 

This perceived insecurity could well influ- 
ence perceptions of other aspects of their jobs. 
A finding similar to this was reported by 
Grove and Kerr (1951) who compared the 
job satisfactions of workers in a financially 
sound company to those of workers in an 
“insecure” company that was in receivership. 
They reported that the feeling of low job 
satisfaction with respect to job security 
seemed to spread and pervade other aspects 
of job satisfaction. In spite of the fact that 
working conditions and pay in the insecure 
company were equal to or superior to that of 
the “secure” company, the insecure workers 
were less satisfied on all 10 of the job satis- 
faction dimensions studied. Nine of the 10 
differences were significant at the .05 level. 
The present study would seem to indicate that 
the same perceptual “halo effect”? may exist 
among managerial personnel with respect to 
need satisfactions. 
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SVIB items related to a global managerial effectiveness criterion were identified 
and cross validated on 461 managers from 13 varied Minnesota-based companies. 
A unit-weighted key composed of 57 items correlated .33 with the criterion on 
a holdout sample. Items which held up well in both the development and cross- 
validation groups are interpreted and distinctions between interest patterns of 
“more” and “less effective” managers are discussed. 


It is reasonable to expect that the voca- 
tional interests of a manager might be related 
to the effectiveness of his job performance. 
His enthusiasm, effort, and level of job satis- 
faction may be largely determined by how 
interested he is in his work and associates. 
This report describes a study which sought 
to determine if the above proposition was 
true and, if so, how the interests of the “more 
effective” manager might differ from those of 
the “less effective’ manager. Findings of such 
a study should be useful in improving the 
selection and counseling of future managers. 

In a review of previous studies evaluating 
the validity of vocational interest measures 
as instruments for selecting managers, it was 
concluded that when such measures have been 
used, they have been quite consistently 
demonstrated to be related to criteria of 
managerial effectiveness (Nash, 1965). How- 
ever, it was also reported that measures of 
vocational interests have not been used nearly 
as extensively or successfully for selecting 
managers as for counseling students. This 
may be explained by the fact that available 
interest measures in general, and the SVIB in 
particular, are subject to several severe 
limitations as selection instruments. 

Probably the most severe limitation of 
interest measures is that they have been 
demonstrated to be generally fakeable. 
Whether they are actually faked in the selec- 
tion setting is not as clear, nor is it known 


1 Appreciation is expressed to Herbert G. Heneman 
and Marvin D. Dunnette for their guidance in the 
completion of the author’s doctoral dissertation, from 
which this article is substantially drawn, and to 
Thomas A. Mahoney for supplying the data. A grant 
from the General Research Board of the University 
of Maryland permitted the continuation of related 
research upon which part of this article is based. 


whether particular scales or keys of available 
measures are all as fakeable as the evidence 
suggests for the measures as a whole. A study 
by Kirchner (1961) touches on both ques- 
tions. He concluded that a large group of 
applicants for sales positions in the Minnesota 
Mining and Manufacturing Company did at- 
tempt to fake the SVIB, but with only a 
modicum of success. This applicant group 
with an assumed predisposition to “look 
good” actually had significantly lower scores 
on the salesman key of the SVIB than did a 
group of experienced employed salesmen with 
an assumed inclination to give straightforward 
responses. These issues deserve additional re- 
search effort, but were not considered further 
in the study described in this report. 

A limitation particularly relevant to the 
SVIB and its standardized occupational scales 
concerns the method used in constructing such 
scales. They have been developed by con- 
trasting responses of a large sample of suc- 
cessful men in each occupation for which a 
scale has been developed with responses of a 
very large “professional men-in-general” ref- 
erence group. However, “successful” in each 
occupation has been determined through the 
application of external criteria such as length 
of tenure in the occupation, annual level of 
income, level of education and training at- 
tained, certification in professional societies, 
and selection by “competent authorities.” No 
known attempt has been made to differentiate 
the more successful from the less successful 
within each occupation. Consequently, a high 
score on a given occupational scale may not 
be indicative of a high position on an inter- 
nally derived criterion of success within this 
externally defined “successful” group. 

Finally, many organizations apparently 
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have not felt it was feasible to use either the 
standard scales or tailor-made scales in their 
selection programs. The conduct of necessary 
concurrent or predictive validity studies 
would require more subjects (Ss) to partici- 
pate, and more technical personnel to carry 
out the study than most companies ordinarily 
employ. 

The study reported herein was designed to 
reduce many of these limitations. Its purpose 
was to determine if the measured interests of 
a large sample of managers from several di- 
verse companies were related to an internal 
criterion of managerial effectiveness and, if 
so, to develop an interest key applicable in 
varied situations. Although such a key would 
have to be evaluated in each individual or- 
ganization contemplating its use, such an 
evaluation would be considerably simpler and 
less costly than an attempt to develop and 
evaluate one from scratch. The problem con- 
cerning the application of standard occupa- 
tional scale scores to the selection of “more” 
and “less effective’ members within an oc- 
cupational group would be eliminated since 
the proposed key would be based on differ- 
ences in effectiveness observed within the 
managerial group, rather than on differences 
between members of the group and a men- 
in-general reference group. The scoring of 
such a key would be simple enough so that 
personnel within virtually all organizations 
could easily do it themselves. Although it 
would not be likely to reach the level of valid- 
ity attained by standard SVIB scales in differ- 
entiating between occupation groups, previous 
studies suggest it may be potentially more 
valid than such standard scales for differ- 
entiating between more and less effective 
managers (Achard & Clarke, 1945; Knauft, 
1951). 


METHOD 


Responses to the SVIB and simultaneously ob- 
tained alternation rankings of participating managers 
used in the development of an effectiveness criterion 
were primarily gathered during 1956-1958 by re- 
search personnel of the Management Development 
Laboratory, Industrial Relations Center, University 
of Minnesota. A total of 468 managers employed in 
13 different companies participated in the study, but 
only 461 filled out the interest blanks properly and 
had available sufficient criterion information to per- 
mit their inclusion in the analysis of the data. These 
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461 managers were divided into two approximately 
equal-sized sub-groups (Samples 1 and 2) closely 
matched on such factors as job type and level in the 
organizational hierarchy, effectiveness rankings, or- 
ganization, and industry type. An item analysis of 
Sample 1 managerial responses to each of the 400 
items in the SVIB was conducted to identify those 
items which differentiated the managers on the 
criterion of effectiveness. Item differences identified 
in Sample 1 were used to develop scoring keys which 
were then cross validated on Sample 2 managers. 


Sample 


The participating managers were selected from 
companies representing various industries, organiza- 
tion levels, and types of managerial jobs. This was 
done to assure that a wide spectrum of managerial 
staffing situations would be included so that results 
might have broad applicability to staffing problems 
in many diverse organizations. Participating compa- 
nies generally had operations in Minnesota and 
ranged in size from 100 to approximately 4,000 em- 
ployees; the number of managers employed ranged 
from 24 to over 300. Industries represented included 
heavy and light manufacturing, finance, wholesale 
distribution, insurance, public utilities, and agricul- 
tural products. 

Managers were identified on the basis of whether 
they were performing in a position which (a) is 
classified under provisions of federal wage and hour 
legislation as an exempt position, and (0) involves 
the direct supervision of one or more subordinate 
positions excluding that of personal secretary. Both 
staff and line positions as commonly defined were 
included as managerial positions if they fit the above 
definition. 

Distinct functional units and hierarchy levels were 
identified and an attempt was made to sample mana- 
gerial positions from each level and functional unit 
within each company. The resulting sample of man- 
agers included about 25% from the top level of their 
organizational hierarchies, 30% from the middle, 
and 45% from the lowest level. A simple classifica- 
tion of positions as production (supervision of 
manufacturing production), office (supervision of 
office operations), and technical (supervision of 
technical and professional personnel) indicates that 
about 32% of the sample was engaged in production 
supervision, 53% in office supervision, and 15% in 
technical supervision. 


Criterion of Managerial Effectiveness 


The criterion attempted to measure effectiveness in 
the performance of general managerial responsibili- 
ties rather than in specific elements of performance 
peculiar to a single position. Independent alternation 
rankings of the managers in the sample were ob- 
tained from up to six executives in each organization 
who were familiar with their performance. Varied 
numbers of managers were ranked depending upon 
the ranker’s familiarity with the individual and his 
performance in accordance with specific instructions 
to exclude from the rankings any manager whose 
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performance was unknown to them or whose assign- 
ment was sufficiently different from the others to 
prevent comparison. The rankings were converted 
into percentile scores and averaged to permit the 
managers to be compared and classified along a 
criterion continuum on an intercompany basis. Cor- 
relations between the independent rankings varied 
from .08 through .95 with a median of .65. 

Analysis of possible contaminating influences on 
the rankings revealed no statistically significant rela- 
tionship with age. Relationships between the rankings 
and organization level were found statistically sig- 
nificant at the .05 level of confidence in only 7 of 
16 organizations.” 

Percentile rank scores of Sample 1 managers were 
classified into three approximately equal-sized high- 
middle-low criterion groups. The middle group was 
discarded at this point to sharpen differences be- 
tween the managers on the criterion and thus 
facilitate the identification of SVIB items which 
were related to it. 


Item Analysis 


Responses to each of the 400 items in the SVIB 
were analyzed in Sample 1 to determine if there 
was a relationship with the high (more effective) 
and low (less effective) criterion group classification. 
A decision rule was established that all items would 
be discarded which did not have a chi-square 
probability level of less than .50 and at least one 
item category response difference between criterion 
groups of 10% or greater. An exception to the 10% 
phase of this rule was made when one of the 
response percentages fell between 0 and 8% or 92 
and 100%. This exception is in accord with existing 
SVIB weighting systems which recognize the con- 
striction of variance at the extreme ends of the 
distributions. A difference of 8% or greater was 
regarded as a minimal requirement for occasions 
of this latter type. Application of the decision rule 
resulted in the exclusion of 230 items from the 
original pool of 400, leaving 170 items for key 
development. 


Development and Cross Validation of Keys 


The most efficient key was composed of 57 items 
estimated as most valid in the 170-item pool. Item 
responses with a percentage difference of 10% 
(8% for extremes) or more between criterion groups 
were unit weighted in this key. Thus, all response 
categories in which at least 10% (8% for extremes) 
more of the high criterion group than the low 
criterion group registered a response were given a 
weight of +1 and all response categories in which 
at least 10% (8% for extremes) more of the low 
criterion group than the high criterion group regis- 
tered a response were given a weight of —1. All 
other categories (less than 10%, or 8% for extremes) 
were not weighted. 


2One company was organized in four separate 
units which were considered separately in this 
analysis. 
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This key was applied to SVIB responses of all 
managers in Sample 2. A total score was obtained 
for each manager by summating all plus weights 
received by his SVIB responses to the 57 items and 
than subtracting all minus weights received. These 
total SVIB scores were related to the percentile rank 
scores of effectiveness. A product-moment correlation 
and Tilton’s (1937) measure of overlapping were 
computed along with standard error of estimates 
for each. The correlation was obtained for the 
ungrouped 234 pairs of SVIB and criterion scores in 
Sample 2. The percentage of overlap was computed 
on the upper and lower criterion groups. A scatter 
plot was developed to facilitate visual description 
and examination of the relationship between the two 
variables. 


RESULTS 


Results of the application of the SVIB key 
to Sample 2 responses are shown in Tables 
1 and 2. The correlation coefficient and per- 
centage of overlap are given with their re- 
spective standard errors in Table 1. The 
scatter plot of the relationship is presented 
in condensed form, each variable having been 
cut into approximate thirds resulting in the 
3 X 3 configuration. The number of managers 
with scores falling within each category are 
identified along with the percentage this is 
of the criterion group in which these managers 
belong. 

Table 2 indicates the efficiency of the SVIB 
key in differentiating between more and less 
effective managers in Sample 2 with a cutting 
score of 6 resulting in a selection ratio of .54. 
The cutting score of 6 was chosen to obtain 
a selection ratio of about .50 and to maximize 
the differentiating power of the key in 
Sample 2. This cutting score provided the 


TABLE 1 


CONDENSED SCATTER Pitot oF SVIB Key Resutts 
IN SAMPLE 2 


(r = 33, SE = .065; % overlap = 70%, SE = 6.2%) 








Criterion score categories 





SVIB score 
categories* 0-33 34-66 67-100 
26-44 18 (23%) 22 (27%) 33 (41%) 
17-25 22 (28%) 25 (33%) 32 (40%) 
0-16 39 (49%) 28 (40%) 15 (19%) 
79 (100) 75 (100) 80 (100) 


» A constant of 14 was added to all SVIB scores to eliminate 
negative values in this table. 
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best estimate of the maximum differentiating 
power of the key in Sample 2 for approxi- 
mately a selection ratio of .50, but must be 
interpreted as slightly inflating the probable 
efficiency of the key in other prediction situ- 
ations since it was derived from Sample 2 
data. It would be expected to shrink in the 
proportion of correct predictions made when 
applied to another group of managers, al- 
though probably not too seriously since 
Sample 2 was a cross-validation sample. Evi- 
dence is presented in Table 2 indicative of 
the key’s efficiency for all of Sample 2 and 
for each individual organization participating 
in the study. This latter information provides 
evidence concerning the general applicability 
of the key since it indicates the number of 
companies in which it might have been useful 
if the managers had responded to the SVIB 
before being hired in the same way they 
did in this study, and if they had achieved 
membership in the same criterion group. 


DISCUSSION 


Information presented in Table 1 indicates 
that responses to items in the SVIB key and 
the criterion of effectiveness were correlated 
about .33 on the entire group of Sample 2 
managers and there was a 70% overlap in 
SVIB scores and criterion scores between 
managers in the upper and lower criterion 
groups. Both of these indices are statistically 
significant beyond the .001 level. However, 
it is evident from the condensed scatter plot 
that the middle criterion group could not be 
significantly differentiated with the key from 
the upper and lower groups. 

Table 2 shows that for a selection ratio of 
.54 obtained with the SVIB cutting score of 6, 
a total of 68% correct identifications were 
made of the more and less effective managers 
in Sample 2. This is slightly greater than 
66% obtained in a previous study by 
Mahoney, Jerdee, and Nash (1960) which 
used the same Ss, but applied standard scale 
scores and several other noninterest measures 
as predictors. However, caution should again 
be exercised in interpreting these results be- 
cause of the previously mentioned point con- 
cerning the derivation of the cutting score. 
Of this 68%, 73% correct identifications 
were made of the more effective managers and 


TABLE 2 


Erriciency oF SVIB Key In IpENtTIFYING ‘‘MorE” 
AND “LESS EFFECTIVE” MANAGERS 
IN SAMPLE 2 


(Cutting score = 6, selection ratio = .54) 








Percentage of correct 





Company N identifications 
All 159 68 
A 50 70 
B 8 50 
GC 3 100 
D 5 60 
E 15 53 
F 5 60 
G 15 93 
H 17 65 
I fi 43 
J 10 80 
K 6 83 
ip, 4 100 
M 14 57 





65% of the less effective managers which 
suggests the key was somewhat more efficient 
in identifying the better managers. 

Table 2 information also suggests that the 
key had some general applicability. It cor- 
rectly identified more than 50% of the high 
and low criterion managers in 11 of the 13 
companies participating in the study. An 
average of 70% correct predictions made 
within each company is about the same as 
the 69% obtained with measures used in the 
previous study by Mahoney et al. In two of 
the companies (B and I), the key was not 
related to the criterion. The small number. 
of managers and relatively low magnitude of 
the relationship in some of the other compa- 
nies should also be considered as signaling 
the need for evaluation of the key in any 
company contemplating its use. Nevertheless, 
it does appear that the relationship is not 
a function of the responses of managers in 
just one or two companies, but exists through- 
out most of the companies participating in 
the study. 


Interpretation of Managerial Interests 


A follow-up item analysis of responses by 
Sample 2 managers was conducted to deter- 
mine the stability of each item in the key 
and 170-item pool. Several of the items in 


254 


the key apparently had only a chance rela- 
tionship with the criterion while others not in 
the key but included in the 170-item pool 
were significantly related to the criterion in 
both samples.’ 

The response preferences of the high-cri- 
terion managers as revealed in the item analy- 
ses could be construed as supporting several 
varied hypotheses concerning the interests of 
the more effective manager. An interpreta- 
tion of the distinguishing interests of a hypo- 
thetical more effective manager compared 
with a less effective manager is presented 
below: 


He prefers activities which involve independent 
and intensive thought, perhaps with some risk in- 
volved, but with less regimentation of his time. 
He rejects as uninteresting those activities involved 
in technical and agricultural pursuits. He is not 
particularly interested in activities requiring extended 
periods of attention and concentration on close or 
detailed tasks. He enjoys activities which bring him 
in contact with others, especially if they afford him 
an opportunity to assume a leadership or dominant 
role. He is less inclined to give socially acceptable 
or stereotyped responses and is more oriented toward 
activities closely tied to tangible outcomes. He is not 
service or humanitarian oriented nor is he particu- 
larily enthused with the aesthetic or classical forms 
of entertainment. He seems to prefer physical and 
social activities for his recreational outlets. 


Results of this study suggest that there 
appears to exist to a modest extent a phe- 
nomenon of general managerial effectiveness 
which pervades various managerial assign- 
ments and organizations. It is also suggested 
that this phenomenon of effectiveness is sig- 
nificantly related to the vocational interest 
patterns of managers. The items identified in 
this study should prove useful as a starting 
point for prediction studies using or con- 
templating the use of vocational interests as 
a possible predictor of managerial effective- 
ness. Interested users of these items are cau- 


3 The author will identify these items and their 
response category weights for qualified readers if 
requests are submitted to him at the Department 
of Business Administration, University of Maryland, 
College Park, Maryland. 
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tioned, however, that the SVIB before its 
most recent revision was used in this study. 
Of the 65 best items identified, 10 have been 
replaced and 8 reworded. Of 98 items sur- 
viving both item analyses, 16 have been 
replaced and 16 reworded. The effect of these 
changes on the validity of the key is not 
known, but a large-scale study by Strong, 
Campbell, Berdie, and Clark (1964) suggests 
it may be minor. 

Finally, future prediction studies should 
probably seek to identify differential interest 
patterns associated with managerial success 
by applying the prediction model suggested 
by Guetzkow and Forehand (1961), and 
Dunnette (1963). Application of this model 
should improve the predictive efficiency of 
such measures over that obtained in studies 
such as the one reported here. 
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“AROUSAL HYPOTHESIS” AND THE EFFECTS OF MUSIC 
ON PURCHASING BEHAVIOR 


PATRICIA CAIN SMITH anp ROSS CURNOW 1 


Cornell University 


This study replicates, in a naturalistic setting, a prior finding which supported 
that portion of the “arousal hypothesis” which predicts that a certain degree of 
noise will actually increase activity. Music was varied from loud to soft in 
8 counterbalanced experimental sessions in 2 large supermarkets (N = 1,100). 


The “arousal hypothesis” seems to account for the results: 


significantly less 


time was spent in the markets during the loud session, although there was no 
significant difference in sales, nor in the customers’ reported satisfaction. 


Most studies of the relationship between 
noise and efficiency contain insufficient in- 
formation to permit evaluation (Berrien, 
1946; McBain, 1961). Even the more sta- 
tistically sophisticated studies of music in 
markets (e.g., Stevenson, Jordan, & Harrison, 
1958) lack control groups. 

Industrial management, nevertheless, gen- 
erally assumes that music increases output, 
and reduces absences, monotony, and strain. 
Retail-store management has accepted music 
not so much for the benefit of employees as 
for “encouraging” purchases. Brand (1963) 
argues that music in supermarkets is “de- 
signed to make shopping more enjoyable and 
perhaps to help distract attention from the 
total cost of the shopping cart full of mer- 
chandise Carefully selected music 
proves highly successful in creating a pleas- 
ant, relaxed atmosphere in which to shop.” 

One hypothesis advanced to resolve some 
of the conflicting data in the field is that of 
“arousal” or “activation” (Duffy, 1951). 
Broadly, it states that for a given individual 
engaged in a specific task, a certain degree 
of noise may actually improve performance, 
while a lower or higher degree will retard it. 

An unpublished (1961) study at Cornell 
varied loudness of music (in the middle 
range) in supermarkets, Sales per minute 
were greater with loud music, not because 


1We wish to acknowledge the efforts of the 21 
undergraduate and graduate students who made it 
possible to man this study and to maintain control 
over extraneous variables. We especially want to 
thank the cooperating markets, who generously per- 
mitted at least minor disruption of normal opera- 
tions, not because of promises of immediate payoff 
but because they “wanted to know.” 


of an absolute increase in sales, but because 
less time was spent in the store. This study 
replicates the earlier work. 


METHOD 


Two large markets (Ithaca, N.Y.) which regu- 
larly provide background music were compared 
during the Friday afternoon and Saturday morning 
hours; sessions were repeated in reverse order after 
1 month in each market (March-May, SLLS, 
LSSL). 

Time “in” and “out” (checkout stand), and dollar 
sales were recorded, as well as orally administered 
questionnaires concerning whether each customer (a) 
noticed the music, (b) felt it too loud, too soft, or 
about right, (c) rated his feeling toward the music 
as favorable on the “Faces” scale (Locke, Smith, 
Kendall, Hulin, & Miller, 1964),2 plus background 
information (V=1,100). “Loud” and “soft” were 
set near the limits which management would tolerate. 


RESULTS AND DISCUSSION 


Both hypotheses, that time in store would 
be reduced and that sales would be unaffected 
by loud music, remain tenable (see Table 1). 
Rate of spending is therefore greater during 
loud -sessions—55.6 versus 53.0 cents per 
person-minute (p < .001). 

Results are not due to differential crowding 
(Ns are almost equal), nor to insufficient dif- 
ferences in loudness (34.7% of shoppers 
“didn’t notice” the music in the soft sessions; 
only 11.3% in the loud sessions). The loud 
music did not “drive” the customers from 
the stores (summary ratings of preference 
were not significantly different). Results were 
similar in all comparisons in both markets. 
Groups were also comparable (marital status, 
number for whom shopping, etc.). Tempo of 


2 Appreciation is expressed to General Motors: Cor- 
poration for permission to use the “Faces” scale. 
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TABLE 1 
COMPARISON OF LOUD AND SOFT SESSIONS 
Sales/person Minutes in store Rating of loudness 
“Faces” rating 

Session N M o M o “Too loud” ‘Too soft” of preference 

Loud 553 $9.81 $7.72 17.64 11.13 14.0% 7.9% 89.5% Favorable 

Soft 547 $9.82 $7.88 18.53 783 1539% 28.5% 90.9% Favorable 

Diff. ns p < .001 p < .001 ns 





the music was not fast enough to have in- 
duced a pacing effect. The arousal hypothesis, 
therefore, remains the most likely explanation 
of the results. 

Unanswered questions include (a) the 
shape of the relationship between music and 
activity (the curvilinear portion of the arousal 
hypothesis), (&) possible interaction with 
fatigue for longer sessions (mean time in store 
was only 18 minutes here), (c) the applica- 
tion of these findings to other purchasing 
situations (e.g., department stores or restau- 
rants), and (d) individual differences in 
“arousal” to music (Duffy, 1962, indicates 
wide differences in physiological measure- 
ments). 

One final point: has the study any rele- 
vance for supermarket management? Proba- 
bly only in that music loudness, unless above 
the level of auditory comfort, will not affect 
total sales. Or, perhaps, in that, if the store 
manager wishes to manipulate the number of 


persons in his store at any one time, he may 
do so by increasing or decreasing the volume 
of the music being played. 
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PERFORMANCE AND INTERACTIONAL DIMENSIONS 
OF ORGANIZATIONAL WORK GROUPS* 


FRANK FRIEDLANDER 
U. S. Naval Ordnance Test Station, China Lake, California 


Perceptions of group adequacy and interaction processes by 91 members of 
12 work groups in an R&D organization were factor analyzed. 6 reliable 
dimensions evolved which cut across several previously defined constructs 
and differentiated the 12 work groups from each other beyond the .01 level 
by ANOVAs. Of the total group phenomena variance, 33% was accounted for 
by a single dimension of group effectiveness in problem solving. This dimension 
correlated negatively with (a2) the occupational and educational level of the 
group, (b) the educational heterogeneity of the group, (c) group size, and 
(d) the level of the group in the organizational hierarchy. These findings sug- 
gest that different principles may govern traditioned organizational work groups 
versus ad hoc groups formed specifically for the purpose of an experiment. 


In conceptualizing and portraying the 
nature of group performance and group inter- 
actions in an organizational setting, there 
has been a tendency for theorists as well as 
practitioners to utilize different sets of dimen- 
sions, often at somewhat different levels of 
abstraction, arising from a variety of settings, 
and encompassing many contrasting view- 
points as to the proper and relevant descrip- 
tive variables. These complexities leave ample 
room for intuitive and somewhat stereotyped 
yardsticks to be developed along which diag- 
noses and prognoses are made by the practi- 
tioner, and along which research hypotheses 
and results are offered by the researcher. To 
the extent that these phenomena occur, they 
lead to a multiplicity of criteria measuring, 
implicitly or explicitly, a profusion of con- 
cepts. While each user may be quite content 
with his own pet dimensions, the net col- 
lective result presents a blurred and disturb- 
ing picture. 

The choice of variables or dimensions may 
be dictated by a specific theory to be ex- 
amined, or possibly by the personal experi- 
ence of the investigator as a group member, 
a group leader, or a group trainer. Variables 


1 This paper reports the results of the first phase 
of an ongoing intraorganizational study of work 
groups and the modifications that evolve in these 
groups as a function of management laboratory 
training. This first phase of the study is concerned 
entirely with the establishment of appropriate group 
dimensions by means of which, at a later stage, 
group change might be measured. 


may also be developed by members of the 
groups in terms of their perceptions of inter- 
actions, performance, and deterrence to task 
accomplishment. The latter would seem a 
meaningful, yet infrequently used approach. 

Sells (1963) has emphasized the principle 
that behavior represents the interaction of the 
individual and the environmental situation, 
and that the total variance of any response 
can be accounted for only partially by indi- 
vidual differences in characteristics. Func- 
tional and situational variations may there- 
fore be expected to have marked effects upon 
factored group dimensions. 

Of particular relevance to the researcher 
and consultant in an industrial setting are 
potential biases which may occur as a result 
of borrowing findings from experimental 
laboratory studies of ad hoc groups and 
generalizing these to intact traditioned organ- 
izational work groups. Ad hoc groups are 
defined as those composed of individuals as- 
sembled by an experimenter to work together 
mutually and cooperatively on some specific 
and externally assigned task, while tradi- 
tioned work groups are cooperative associa- 
tions whose members have progressed through 
states of coming together in physical proxim- 
ity, of organizing for common goals, and of 
accepting commitment for the group’s pur- 
poses (Lorge, Fox, Davitz, & Brenner, 
1958). Lorge et al. draw sharp distinctions 
between these two types of groups, and cau- 
tion that “a common and dangerous practice 
is to generalize the principles valid for ad 
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hoc groups to traditioned groups.” Thus, one 
might question the degree of verisimilitude 
between the social environment created by 
the experimenter for his ad hoc group and the 
social and organizational context from which 
the traditioned work group has emerged and 
in which it is now embedded. 

Excellent reviews and attempts to find con- 
gruencies among several previous factor-ana- 
lytic studies are presented by Borgatta, 
Cottrell, and Meyer (1956), Carter (1954), 
and Foa (1961). In the latter two reviews, 
evidence is offered that behavior in groups 
can be categorized into three underlying di- 
mensions: individual prominence and achieve- 
ment, aiding attainment by the group, and 
sociability. The vast majority of studies, 
including those cited in these reviews, focus 
on (@) ad hoc groups (Borgatta, Cottrell, & 
Mann, 1958; Cattell, Saunders, & Stice, 
1953; Couch & Carter, 1952; Mann, 1961); 
(6) population and to some extent structural 
variables to the exclusion of syntality vari- 
ables (Couch & Carter, 1952; Sakoda, 1952; 
Wherry, 1950); (c) an exceedingly small 
number of variables with which to extract 
dimensions that hopefully represent the large 
and complex domain of group phenomena 
(Blake, Mouton, & Fruchter, 1962; Burke & 
Bennis, 1961; Clark, 1953; Couch & Carter, 
1952; Sakoda, 1952; Wherry, 1950), all of 
whom utilized less than 20 variables; and 
(d) items which are perhaps more important 
to the researcher and/or practitioner conduct- 
ing the study than to the subjects and/or 
groups upon which the study was conducted 
(as was the case with almost all of the studies 
cited). 

In view of these many issues, one of the 
purposes of this study was to reformulate sets 
of dimensions which underlie group phe- 
nomena as perceived by members and which 
would account for the interrelationships 
among individual components, intragroup 
processes, and group performance. The study 
was designed (a) to tap variables of signifi- 
cance and utility to the actual members, () 
of intact, traditioned work groups, (c) at the 
most concrete and specific level consistent 
with meaningfulness and applicability across 
a number of organizational work groups, yet 
(d) with a sufficient number of anchor vari- 
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ables from previous group studies to allow for 
relatedness to current theoretical networks, 
and (e) with variables of sufficient breadth 
and quantity so that a variety of group phe- 
nomena might evolve as separate dimensions. 


BACKGROUND 


This study was conducted in one of the 
armed services’ largest research and develop- 
ment stations employing approximately 6,000 
personnel. Ninety percent of the employees 
are Civilians, including about 1,200 scientists 
and engineers. The 12 groups from which 
data were collected were drawn from three 
levels within the organizational hierarchy. 
These levels are described below. 

1. The central coordinating group for this 
organization is a Policy Board, consisting of 
the 4 members of top management, plus 
about 12 members who are also heads of 
their respective operating or staff depart- 
ments. The Policy Board was one of the 12 
groups. 

2. Each department is composed of from 
5 to 10 divisions, and the Department Staff 
Groups are composed of these division 
heads as representatives of their respective 
divisions. Five Department Staff Groups, in- 
cluding three staff-support departments and 
two technical and engineering development 
departments were included. 

3. Each division is composed of several 
branch heads, who are members of the Di- 
vision Staff Groups. Three Division Groups, 
all involved in technical and engineering 
development, were included. 

In addition, the study included one group 
composed of administrative assistants repre- 
senting most of the departments, which 
handles secondary administrative matters, 
and one quasi-official group which has re- 
sponsibility for administration of the em- 
ployee facilities on the Station. The last of 
the 12 groups was composed of employees 
enrolled in an on-Station course in human 
relations, Respondents in this course com- 
pleted the questionnaire describing  tradi- 
tioned work groups of which they were mem- 
bers. Data from this group were used only 
in the factor-analysis section of the study. 

Staff groups at all of these levels are com- 
posed of from 5 to 15 members. They meet 
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usually weekly or biweekly) and work regu- 
arly together for a variety of purposes in- 
luding problem discussion and/or resolu- 
ion, coordination, information dissemination, 
lecision making, policy formulation, future 
Janning, etc. As such, they represent tradi- 
ioned, task-oriented work groups which use 
ypical lateral and hierarchal interaction 
atterns toward their task accomplishment. 


PROCEDURE 


In early 1962, a series of interviews was held indi- 
idually with the members of the Policy Board.2 
though the proposed topic for this set of inter- 
iews did not concern the Board or its meetings 
Ss such, it soon became evident that members were 
ot content merely to discuss the “planned” topic, 
ut instead dwelled rather consistently and concert- 
dly upon the Board membership and _ leadership, 
nd the interactions and effectiveness of the Board 
1eetings. 

Detailed notes were taken during these interviews, 
nd the verbatim comments by members were re- 
hrased into questions to form the main body of 

questionnaire. Additional group-descriptive vari- 
bles were obtained through observation of the 
neetings of other staff groups and through discus- 
ions with members of several different groups. 
‘elevant group-descriptive dimensions, issues, and 
ypotheses recurrent in the professional literature 
rovided a third source of information. These were 

ll combined into a questionnaire composed of 120 
ems, 

In addition to evaluations of adequacy and ef- 
ectiveness of a group and its meetings, the variables 
ncompassed perceptions of the actual network of 
eelings, both in terms of the perceptions of one’s 
wn position in the network as a member, and the 
erception by members of relationships existing be- 
ween other members of the group. Such a procedure 
as been suggested by Tagiuri (1958). The types of 
ems that were tapped included cooperation, com- 
etition, openness, satisfaction, initiative, self-aware- 
ess, individual identity, participation, dependency, 
pontaneity, creativity, responsibility, sensitivity, in- 
imacy, effectiveness, “teamness,” conflict, divergency 
{ ideas, communication, procedural adequacy, au- 
hority relations, exploitation, mutual influence, and 
onsensus. 

In addition, directly quantifiable data were col- 
scted for each individual concerning the number of 
neetings he had previously attended, the number of 
opics he had submitted for the agenda, his estimate 
f the length of the last meeting, of the percentage 
f time the chairman had talked, of the percentage 
f time he had talked, and the number of problem 
reas which he felt needed discussion at the next 
neeting. A nine-adjective semantic differential of the 


2The author gratefully acknowledges the part 
layed by Evelyn Glatt of this organization for 
onducting and recording these interviews. 
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concept “X Department Staff Meetings” was also 
included. 

Certain parameters of the component individuals 
of each group were also obtained. For each group, 
(a) the mean, (b) dispersion (as measured by the 
standard deviation), and (c) the distance of the 
leader from the group mean (as measured by the 
difference between the group leader and the mean 
of his group) were calculated for the following: age, 
tenure, grade level, and education. In addition, the 
size of the group, the length of its meetings, and 
the relative level of the group in the organizational 
hierarchy were obtained. 

After pretesting and refining the initial question- 
naire, it was reduced to 70 items.? Variables were 
deleted which were perceived by group members as 
ambiguous or which gave indication of low reli- 
ability and/or low common variance with any other 
items. The final questionnaire, described as the 
Group Behavior Inventory (GBI), was administered 
twice to each of the 12 groups. The period between 
administrations ranged from 6 to 12 months. Only 
members who completed both administrations of 
the GBI were included in the analysis. 

A cover letter on the GBI explained briefly the 
purposes and expected values of the study and 
solicited the member’s cooperation. Instructions for 
completing the form were as follows: 


Think of the past two or three (name of group) 
meetings which you have attended. The following 
questions apply to either the meetings or the 
group which attended these meetings. Please indi- 
cate your agreement or disagreement with each 
statement as follows: 


scale: 
strongly 


There followed a 5-choice Likert-type 
strongly agree, agree, neutral, disagree, 
disagree. 

Group members were introduced to the study at 
one of their regular meetings. After questions con- 
cerning the study were discussed and answered, the 
GBI was distributed to each member of the group 
to be completed at his leisure in the privacy of his 
own office. The only identification affixed to the 
questionnaire was a code number of the respondent’s 
own choice. This was requested so that results of 
the first and second administrations might be com- 
pared and related. 


RESULTS 


Pearson correlations were computed among 
all 70 variables, and the results were sub- 
jected to a principal components analysis and 
Varimax rotation. Ignoring factors which ac- 
counted for less than 2% of the variance, 13 


’ A copy of the complete 70-item questionnaire has 
been deposited with the American Documentation 
Institute. Order Document No. 8787 from ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington, D. C. 
20540. Remit in advance $1.25 for microfilm or $1.25 
for photocopies and make checks payable to: Chief, 
Photoduplication Service, Library of Congress. 
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TABLE 
RELIABILITY COEFFICIENTS OF THE NINE GROUP DIMENSION SCALES OF THE GBI 
Reliability coefficient 
Group dimension rKR207 Ytest—retest”° 

I. Group effectiveness 90 .80 

II. Approach versus withdrawal from leader 91 81 

II. Mutual influence LP Al 

IV. Personal involvement and participation Bil 80 

V. Intragroup trust versus intragroup competitiveness 13 .68 

VI. General evaluation of group meetings 89 .64 

VII. Submission versus rebellion against leader 52 .50 

VIII. Leader control 255 ron 

IX. Role and idea conformity roe ou 
oN =91, 
bN =60. 


¢ The time-interval which elapsed between test and retest was approximately 6 months. 


factors evolved from this analysis which ac- 
counted for 78% of the total questionnaire 
variance. Using an arbitrary criteria of .30 
as a minimum factor loading, two of these 
factors were dropped for lack of sufficient 
items to adequately define the factor space. 
(One was composed of one item, and one was 
a doublet.) In an attempt to obtain greater 
factor meaningfulness and clarity, the re- 
maining factors were subjected to a graphic 
hand rotation, during which two pairs of 
overlapping factors collapsed. The final result 
was a set of nine factors composed of 55 
items which now accounted for approximately 
70% of the 70-item total variance. 

Since in practically none of the previous 
factor-analytic studies have factor reliabili- 
ties been reported, it was decided that these 
might provide a worthwhile addition. Both 
internal consistency measures as computed by 
Kuder-Richardson Formula 20 and test-retest 
measures (with a 6-month interval) are re- 
ported in Table 1. The internal consistency 
reliabilities of the first six factored scales of 
the GBI were deemed sufficiently high for 
utilization in the remainder of the study. The 
reliabilities of Factors VII, VIII, and IX 
were judged too low for further statistical 
analysis. However, a description of these fac- 
tors and their component items is given 
briefly for those doing further research in the 
area.* 


4 Of the first six factors, the test-retest reliabilities 
of scales V and VI are somewhat low. However, 
since the 6-month interval is longer than usually 
considered for test-retest reliabilities and since the 


In the following portrayal of the nine fac- 
tors, the descriptive title of each is followed 
by the proportion of the total variance ac- 
counted for. Factors with a “versus” in the 
title are interpreted as bipolar factors in 
which the positive and negative poles cannot 
be defined as the lack of each other, but 
rather define genuinely different phenomena. 


Factor I: Group Effectiveness 33% 
The group is an effective prob- 


lem-solving team wht 
Group meetings result in creative 

solutions to problems ad 
Meetings do not formulate future 

policy —.69 
Meetings do not come to grips 

with the real problems —.60 
Meetings are not effective in dis- 

cussing mutual problems == 00 
There is open examination of 

issues and problems at group 

meetings .60 
Others assume responsibility for 

setting group goals 59 
The chairman offers new ap- 

proaches to problems at meet- 

ings 58 
I expect little from group meet- 

ings ; ey 





internal consistency of these scales was satisfactory, 
it was decided that these scales were acceptable in 
terms of reliability. In addition, since the internal- 
consistency reliabilities were pooled for first and 
second administrations, the magnitude of these repre- 
sents appreciable stability in consistency over a 
6-month period. 
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_ This dimension describes group effective- 
‘ness in solving problems and in formulating 
policy through a creative, realistic team ef- 
fort. The positive pole depicts a group which 
arrives at creative team solutions, sharing 
-responsibilties and problems openly. The 
negative pole characterizes a group which 
‘fails to deal with problems of relevant and 
mutual concern to future policy formulation 
and to the members, who seem to hold low 
expectations for group meetings. 


Factor II: Approach To Versus Withdrawal 
From Leader 8.4% 


Others can approach the chair- 


man with ease 83 
Others feel at ease when talking 

with the chairman 80 
I feel at ease when talking with 

the chairman ait 
I can approach the chairman 

with ease U2 


Others withdraw from the chair- 
man when disagreements arise 
Others are reluctant in pushing 
ideas 
Group members are more intent 
on satisfying the chairman 
than in optimizing the potential 
output of the group 
I withdraw from involvement 
with the chairman when dis- 
agreements arise 
Others’ behavior does not reflect 
their true feelings —=57 
At the positive pole of this dimension, 
members feel that the leader is approachable 
and that they can establish a comfortable 
relationship with him, At the negative end, 
members withdraw from the leader. They 
do not push their ideas, do not behave ac- 
cording to their feelings, and seem intent on 
catering to the leader at the possible sacrifice 
of group output. 


10) 


==), 


=Or 


OL 


Factor III: Mutual Influence 7.1% 
J have influence with the chair- 


man 64 
Others have influence with the 

chairman 61 
I accept influence from other 

group members 96 


261 
Others accept influence from 
other group members 48 
I assume responsibility for set- 
ting group goals of the group 42 
This dimension describes groups in which 
members mutually influence each other 


and the leader, and assume responsibility for 
setting group goals. 


Factor IV: Personal Involvement and Par- 
ticipation 3.1% 


I want to actively participate in 
meetings LE) 

I estimate that I talked —-% of 

the time during the last meet- 

ing 52 

expect decisions on important 

matters to be made at group 

meetings Soil 

submitted —— topics for the 

agenda for the last meeting 38 

I am reluctant in pushing my 

ideas = 

have attended approximately 

—— meetings of this group dur- 

ing the past 12 months of 

The problem areas which need 
discussion at the next group 


= 


= 


os 
CO 


_ 


meeting are (number of) 36 
Group meetings should be dis- 

continued ao 
Group meetings should be con- 

tinued 34 


Individuals who want, expect, and achieve 
active participation in group meetings are 
described by this dimension. The combina- 
tion of high expectations and actual partici- 
pation implies a fulfillment which is reflected 
in the desire to continue the group meetings. 
It is interesting to note that this dimension is 
composed almost entirely of self-perceptions, 
rather than perceptions of the group, of the 
leader, of others, etc. 


Factor V: Intragroup Trust Versus Intra- 
group Competitiveness 3.0% 


There is a destructive competi- 
tiveness among members of the 


group ad 
Others are reluctant to sacrifice 

ideas so that the group may 

agree — 58 
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There are too many personal 
opinions raised at meetings, 
as opposed to the broader 
point of view 

There is trust and confidence in 
each other among members of 
the group 44 

Conflict within the group is sub- 
merged, rather than used con- 
structively —.40 

At the positive end, this dimension depicts 

a group in which the members hold trust and 
confidence in each other. A group at the nega- 
tive pole can be characterized more as a col- 
lection of individuals who are reluctant to 
sacrifice their individual personal opinions 
and ideas for the sake of a working consen- 
sus. This reluctance occurs in an environment 
of destructive competition and one in which 
conflict is submerged rather than used in an 
open and constructive manner. 


—.47 


Factor VI: General Evaluation of Group 
Meetings 2.3% 


good .. . bad 14 
valuable . . . worthless he 
weak . . . strong —.64 
unpleasant . . . pleasant FO) 
deep . . . shallow ES 
active . . . passive 4 


This may be the most difficult dimension 
to define because it involves many varied yet 
ill-defined characteristics. It probably is a 
measure of a generalized feeling about the 
meetings of one’s groups—as either good, 
valuable, strong, pleasant, etc., or as bad, 
worthless, weak, unpleasant, etc.” 


Factor VII: Submission to Versus Rebellion 
Against Leader 5.0% 


I submit to the chairman when 
disagreements arise 56 


5 This factor pattern was unexpected, since pre- 
vious studies (Osgood, Suci, & Tannenbaum, 1957) 
have indicated that semantic differential items gen- 
erally factor into three distinct categories. Although 
the preliminary questionnaire contained nine adjec- 
tive pairs, three from each of the evaluative, 
potency, and activity dimensions found by Osgood 
et al., the above six adjective pairs factored entirely 
into one dimension. The relaxed-tense adjectives 
loaded on Factors VII and VIII, while heavy-light 
and fast-slow adjectives contained sufficiently small 
amounts of common variance with other items so 
as to be dropped from the final GBI. 
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I rebel against the chairman 


when disagreements arise —.54 
The policies under which the 

group works are clear-cut | 
Others submit to the chairman 

when disagreements arise 40 
Group meetings are .. . relaxed 

(rather than) .. . tense on 


Groups at one end of this dimension tend 
to submit to the leader when disagreements 
arise, while groups at the other end are in- 
clined to be rebellious. Tension rather than 
relaxation then occurs. 


Factor VIII: Leader Control 3.3% 


Most material covered in meet- 
ings is introduced by the 
chairman 

I estimate that the chairman 
talked about —-% of the time 
during the last meeting 

The group should have an expert 
on hand to settle certain ques- 
tions 49 

Group meetings are . . . relaxed 
(rather than) .. . tense mi) 

Meetings are primarily a means 
of information dissemination OU 

This dimension describes the extent to 

which the leader initiates and controls the 
group process, primarily through domination 
of the communications system in a one-way 
(leader-to-group) direction. Groups in which 
leader control is high express tension and a 
desire not to have an expert on hand. 


re 


==) 50) 


Factor IX: Role and Idea Conformity 2.0% 
Others act the role that is ex- 


pected of them .60 
Divergent ideas are discouraged 

at meetings 96 
The chairman is oriented toward 

production and efficiency 36 


Groups in which there is pressure toward 
conformity to a set of expectations concern- 
ing both role behavior and ideation are de- 
scribed by this factor. The production and 
efficiency orientation of the chairman typify 
an “initiating structure” leadership style 
which, according to Fleishman, Harris, and 
Burtt (1955), implies that the leader organ- 
izes and defines relationships between him- 
self and the members of his group and defines 
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TABLE 2 
CORRELATIONS BETWEEN VARIOUS DEMOGRAPHIC CHARACTERISTICS AND SIX GRouP DIMENSIONS 
Group dimension 
Demographic characteristic i II iil IV V VI 
Age—Group mean —.07 —.37 —.52 04 aly —.31 
Age—Heterogeneity Lal 19 .00 —.03 43 29 
Age—Leader-group distance* —.36 —.22 .09 —.12 —.35 — 41 
Tenure—Group mean — .23 —.24 —.21 —.04 .06 — 28 
Tenure—Heterogeneity .10 90 —.11 —.01 22, —.29 
Tenure—Leader-group distance —.46 03 row —.01 —.23 —.43 
Occupational level—Group mean —.80** —.44 —.27 —.39 —.52 — .68* 
Occupational level—Heterogeneity 24 .00 —.02 —.02 14 ou 
Occupational level—Leader-group distance —.55 —.35 —.26 —.57 — .30 —.56 
Educational level—Group mean — 69% — —:13 —.11 —.30 —.65* —.61* 
Educational level—Heterogeneity —.67* —.46 —.52 —.41 —.70*  —.60* 
Educational level—Leader-group distance oo poo a5 .26 39 .20 
Group size —.82** —.68*  —.52 — .66* —.77** —,71* 
Group hierarchal level —.74** — Al —.13 —.44 —.63*  —,49 





Note.—Based on data from the first administration of the questionnaire; N = 11 groups. 
a All leader-group distance scores were computed by subtracting the group mean from the leader’s score. 


#*b <.05. 
> < 01. 


the role which he expects each member to 
assume. 

So as to obtain some further understanding 
of the meaning and relevance of these dimen- 
sions, factor scores for the six reliable factors 
(7xR20 > .70) were computed for each group 
member. Six separate 1 X12 analyses of 
variance indicated that all six group-dimen- 
sion scales of the GBI successfully differen- 
tiated the 12 groups from each other beyond 
the .01 level of significance. 

Group means on each of the six group di- 
mensions were then correlated with group 
measures on four population characteristics 
including the age, organizational tenure, oc- 
cupational level, and educational level of each 
group. 

Three measures of each of these four char- 
acteristics were obtained: the mean of the 
group, the dispersion or heterogeneity within 
the group as measured by the standard devi- 
ation, and a leader-group distance measure 
computed by subtracting the group mean 

from the leader’s score. Additional variables 
with which the six group-dimension scales 
were correlated were the group size and 
hierarchal level at which the group operates 
within the formal organizational structure. 
Table 2 indicates the correlation coefficients 


between the six reliable dimension scales and 
14 group characteristics. 

With an WV of 11 groups, it is obvious that 
not many of the relationships are statistically 
significant. The 14 demographic character- 
istics seem to have the greatest effect upon 
Factors I, V, and VI. Generally groups higher 
in the organizational hierarchy, groups com- 
posed of higher-level personnel, and groups 
in which members have greater education 
perceive themselves as less effective (1), more 
competitive (V), and having less-valued 
meetings (VI). The demographic character- 
istic which seems to have the highest relation- 
ship to all six group dimensions is group size. 
Small groups perceive themselves acting more 
as an effective team (I), able to approach 
their leader with ease (II), more involved 
(IV), experiencing greater trust (V), and 
conducting more valued meetings (VI). In 
addition, groups in which there is homo- 
geneity in attained education perceive their 
group as a more effective team, the meetings 
as more valuable, and their leader as more 
approachable. 


DiIscussION 


Variations in performance and in the inter- 
action processes of traditioned work groups 


264 


can be accounted for by approximately nine 
separate dimensions which underlie the group 
phenomena of most relevance to members. 
Factor I, which accounts for one third of the 
total group phenomena variance, encompasses 
the effectiveness of the group as a problem- 
solving team, and coincides closely with a 
syntality dimension (Cattell et al., 1953) in 
that it represents the performance of the 
group acting as a whole. Energies devoted to 
attaining group goals (effective synergy) as 
represented by Factor I can be differentiated 
from those directed toward maintaining the 
group and the group processes (maintenance 
synergy) as represented in the Trust-Compe- 
tition Factor (V). The latter is similar to the 
concept of viscidity (Hemphill, 1956), which 
is the absence of dissension and personal con- 
flict among members, the absence of activities 
serving to advance only the interests of indi- 
vidual members, and the ability of the group 
to resist disrupting forces. 

Most of the remaining factors which 
evolved exclude syntality characteristics and 
encompass intragroup structural variables 
which are descriptive of the internal behavior 
of the group, including its internal interac- 
tions, processes, and procedures. Typical of 
the category is Factor III, which typifies a 
mutual interaction process in which influence 
is successfully exerted and received, as well 
as those factors which describe initiation by 
the self (IV) and by the leader (VIII) and 
concomitant reactions to the leader of ap- 
proach versus withdrawal (II) and submission 
versus rebellion (VII). The submission ver- 
sus rebellion alternative appears distinct and 
unrelated to the withdrawal (from leader) re- 
sponse as described in Factor IIT. 

Tt is evident that far greater differentia- 
tion has occurred in this factor structure 
than in the more simplified three-factor struc- 
tures noted in reviews by Carter (1954) and 
Bales (1956). This is understandable in view 
of the growing evidence that the number and 
complexity of factors required to account for 
group phenomena variance increases as a 
function of the length of time the group 
members have interacted and as a function 
of the ad hoc versus traditioned nature of the 
group. Fusion of factors early in a group’s 
existence can be attributed in part to the fact 
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that in ad hoc groups, members are un- 
acquainted with each other and are thereby 
unable to differentiate between those who are 
oriented toward satisfying personal goals and 
those whose efforts are directed at attaining 
group goals (Mann, 1961). Similarly, the 
ability to differentiate among role specializa- 
tions in ad hoc groups increases over time as 
the group becomes more organized and struc- 
tured (Slater, 1955). 

It is also evident in this study that per- 
sonal attribute variables (occupational and 
educational level) correlate with considerably 
fewer factors (see Table 2), and to a far 
lower degree than found by Cattell et al. 
(1953) in his studies of ad hoc groups dur- 
ing the first 3 hours of their existence. Based 
on the high loadings of personal attribute 
variables, Cattell concludes that these are the 
probable causes of associated group perform- 
ances. While it seems reasonable for individ- 
ual attributes to dominate the group factor 
structure and to be dispersed throughout it 
soon after a group has been formed, it seems 
equally reasonable that in more mature tradi- 
tioned groups such as those in the current 
study, personal attribute variables correlate 
with fewer group factors and with lower mag- 
nitudes. As a group matures, a decreasing 
proportion of the total group behavior vari- 
ance can be accounted for by individual 
attribute variance. 

Portions of most of the dimensions in this 
study which describe interpersonal interac- 
tion and processes have been cited in one 
form or another in the literature on group 
dynamics and group change. For example, 
high mutual influence (IIT), high individual 
participation (IV), low leader control 
(VIII), and high group trust (V) would be 
considered most conducive to durable modi- 
fications in group ideology or social practice 
(Coch & French, 1948; Lewin, 1947). 
It is of interest, however, to note that these 
concepts evolve as separate dimensions from 
each other, and that these separate dimen- 
sions are, in turn, distinct from such criteria 
dimensions as Group Effectiveness (1) and 
Evaluation of Group Meetings (VI). 

It is of further interest to note the relation- 
ship between the factor pattern in the cur- 
rent study and that found in leadership styles 
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by Fleishman et al. (1955). Some elements 
of “initiating structure” are apparent in both 
the Leader Control dimension (VIII) and in 
the Role and Idea Conformity dimension 
(1X). But these two leader-initiating dimen- 
sions are quite distinct from dimensions which 
describe member reaction to leadership (II 
and VII). It might be surmised that the per- 
ception of initiating leadership style has little 
relationship to approach-withdrawal reactions 
or to submission-rebellion reactions by su- 
bordinates to that leader. While confirming 
to us his perception of an initiating and con- 
trolling leader, the subordinate is neverthe- 
less denying that this style of leadership 
necessarily creates in him a consistent re- 
sponse of either submission-rebellion, or of 
approach-withdrawal. 

Of particular interest are the several nega- 
tive relationships found in this study between 
occupational, educational, and _hierarchal 
level on the one hand, and perceived group 
effectiveness on the other hand. These nega- 
tive relationships may be due to differences in 
normative frames of reference or in levels of 
aspiration. For example, groups operating at 
high hierarchal levels, whose members have 
risen to relatively high occupational levels, 
and whose members have greater education, 
might have higher aspirations or higher stand- 
ards for group effectiveness in solving prob- 
lems and in formulating policy through cre- 
ative realistic team effort. It is also possible 
that higher group evaluations with decreasing 
occupational, educational, and hierarchal lev- 


els are based upon a stronger need by juniors 


to appear effective (to themselves and others). 
In this case, work groups (of which they are 
a part) provide ego support for juniors that 
seniors seem to derive elsewhere. 
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3 experiments were conducted (a) to determine the number of Federal Standards 
colors which normal and deuteranopic Ss can identify absolutely under a 
variety of viewing conditions, and (b) to identify optimum subsets of these 
colors for information coding under various operational circumstances. Results 
suggest that under optimal circumstances Ss can identify 24 Federal Standard 
colors, a number far in excess of most earlier estimates. Furthermore, careful 
selection can provide a 10-color subset identifiable under even marginal lighting 
conditions by normal Ss, and an 8-color subset identifiable even by deuteranopes. 
Discrepancies between these and earlier findings are explained primarily on 
the basis of insufficient color-label training. 


Interest in the problem of color identifica- 
tion can be traced to three related sources. 
First, it has accompanied the development 
of models for color experience such as those 
described by Helm (1964) and by Indow 
and Kanazawa (1960); second, it has 
stemmed from attempts to assess basic human 
information-handling capacities as illustrated 
by the work of Erickson and Hake (1955), 
Garner and Creelman (1964), Miller (1956), 
and Conover (1959); and third, it has arisen 
within the context of display design and 
coding applications as exemplified by Jones 
(1962), Conover and Kraft (1958), and 
Chapanis and Halsey (1956). 

This diversity of emphasis has led, natu- 
rally, to marked differences in the stimulus 
and response characteristics used in studying 
color identification. Those seeking a basic 
understanding of human color mechanisms 
have drawn stimuli from color systems which 
maximize the range of experience (e.g., 
Munsell system, ISCC system), while others 
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33-(616)-6166 with the Ohio State University Re- 
search Foundation and monitored by the Aero Medi- 
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reproduction is authorized to satisfy the needs of the 
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concerned with direct applications have in- 
vestigated more restricted color populations. 
Similarly, identifying responses have varied 
from randomly assigned numbers (Conover, 
1959) to the Munsell notations of hue, value, 
and chroma (Hanes & Rhodes, 1959). 

The usual procedure followed in these 
studies has been for the subject (S) to view 
individually a series of distinct colors and 
to indicate the identity of each using a pre- 
established nomenclature. The number of 
colors in the set to be judged is normally 
held constant at a level somewhat greater 
than that expected to yield perfect perform- 
ance. Since identification errors are thus 
likely to occur, evaluation of performance 
is frequently augmented by information or 
confusion analysis; this permits an estima- 
tion of how many colors might have been 
judged perfectly had the stimuli been chosen 
in a perceptually optimum fashion (e.g., 
Conover, 1959). It is important to note that 
throughout these procedures the assumption 
is made that S is completely familiar with the 
response nomenclature. Unfortunately, how- 
ever, training procedures have been far from 
standardized, and the assumption of perfect 
color-label association is open to serious ques- 
tion, 

Considering the differences in orientation 
and methodology, it is surprising to note that 
the results obtained in most of these investi- 
gations have shown a fair amount of agree- 
ment. It is usually found that color-normal 
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Ss are capable of identifying accurately 
about 10 two-dimensional colors (hue and 
saturation), and it has been suggested that 
the number could be increased significantly if 
a third dimension (brightness) were added 
(Miller, 1956). 

The present research borrows from both 
the basic and applied objectives described 
above: it seeks to evaluate the human ca- 
pacity for identifying colors using the most 
rigorous procedures available; at the same 
time it is directed toward a specific and 
rather limited set of color stimuli, those pro- 
vided by the Federal Standard Color System. 
This system comprises some 381 paints au- 
thorized for use by the Federal Government 
(Journal of the Optical Society of America, 
1957, 47, 330-332). The ultimate purpose 
that it would serve would be to provide a 
set (or sets) of colors which all governmental 
agencies could use in devising optimum color 
codes. To broaden the usefulness of such 
data, experiments are reported using a variety 
of common illuminants and color-deficient as 
well as color-normal Ss. 


METHOD 
Stimuli 


In all, three experiments were conducted using 
Federal Standard No. 595 as the source of color 
samples, Since the task in all three was one of abso- 
lute identification, the 381 colors included in this 
system were obviously far too numerous for mean- 
ingful investigation. Therefore, selection of a much 
smaller subset was carried out in a manner intended 
to maximize discriminability and coding utility. 

Three major considerations guided the selection 
of a color subset. First, all high and medium gloss 
samples were eliminated since a glossy surface is 
known to interfere seriously with color perception 
under certain viewing conditions. Next, an attempt 
was made to select samples which were as distinct 
as possible on the three basic dimensions of color: 
that is, hue, saturation, and brightness. Since paints 
included in the Federal Standard collection were 
approved for many diverse reasons (e.g., aesthetic 
appeal, visibility, wearing quality, camouflage capa- 
bility), the system describes no dimensional 
gradations and hence furnishes no direct means of 
estimating perceptual distances. Consequently, all 
samples were converted to Munsell designations from 
which estimates could be made. Finally, it was de- 
cided that colors approximating white, black, or 
gray would be undesirable for practical coding pur- 
poses, and the following restrictions were adopted: 
using the Munsell notations, chroma was required to 
be at least 4; value between 4 and 8; and for values 
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TABLE 1 


CoLtor SAMPLES SELECTED FROM THE FEDERAL 
STANDARD SYSTEM EXPRESSED IN MUNSELL 
NOTATIONS 








Munsell designation 








Federal AA 
Munsell description 
standard rine Val- Chroma 
ue 

30109 Light reddish brown 10.0R 4 6 
30111 | Moderate reddish brown | 10.0 R 4 4 
30117 | Moderate brown 5.0 YR 4 4 
30206 | Grayish red 7.5R a 4 
302198 | Light brown hoo VOR 5 4 
30233 Light reddish brown 2.5 VR 5 4 
30252 Grayish reddish orange ore Vie 5 6 
30257" | Dark orange yellow 10.0 YR 6 6 
30313 Light grayish brown 5.0 YR 6 4 
30450 Light yellowish brown ha 7 4 
31136" | Strong red 5.0R 4 12 
31158* | Moderate red 2.5: Fe 4 10 
31433" | Moderate yellowish pink | 10.0 R 7 6 
32169* | Dark reddish orange 10.0R 5 10 
32246* | Vivid reddish orange 10.0R 6 14 
323569 | Moderate reddish orange ade 6 10 
326484 | Pale orange yellow 7.5 YR 8 4 
33434" | Moderate yellow 2k 7 8 
33481 Strong yellow 5.0) i 8 
33538 | Vivid orange yellow 10.0 YR 8 14 
33695" | Light yellow Dae 8 6 
33793 Light greenish yellow ee 9 6 
34108" | Dark yellowish green 2eny 4 6 
34127* | Moderate olive 10.0 Y 4 4 
34227 Gray green 10.0 GY 5 4 
34258" | Moderate yellow green Doi 6 4 
343254 | Light green 7.5 G 6 4 
34533 Moderate yellow green 7.5 GY a 4 
34552 | Light yellow green 2.5 GY 8 6 
34558" | Very pale green 10.0 GY 8 4 
35109" | Grayish blue 10.0 B 4 4 
35177 Moderate blue 2.5 PB 5 6 
351892 | Moderate greenish blue 5.0 B 5 4 
35193 Moderate bluish green 10.0 BG 4 4 
35231 | Pale purplish blue 5.0 PB 5 4 
37144" | Moderate purple 5.0)P 4 6 





Note.—These are approximate equivalents chosen for 
reasons described in the text. 


® These colors constitute the subset of 24 which were found 


to be optimum for identification in Experiment I (see results 
section). 


of 8, chroma was required to be at least 6. In the 
Munsell system these numbers represent medium 
lightnesses and a reasonable degree of saturation 
(see Munsell Color Company, 1959). 

A total of 36 colors survived these elimination 
procedures, and the Munsell specifications for this 
subset appear in Table 1. Each of these colors 
appeared as a 3X 5-inch color “chip” which was 
mounted on a piece of 7.5 X 5-inch medium-gray 
cardboard for presentation to S. 


Apparatus 


Two booths were used for both the training and 
identification phases of the experiments. In the first 
(display) booth, all the colors were displayed on a 
table together with the response designation (two- 
digit number) required for each. This, of course, 
permitted S to make direct comparisons among 
stimuli and to follow any pattern of concentration 
that he desired. Arrangement of colors was based 
upon modal order preferences of seven pilot Ss. This 
booth was illuminated by two 48-inch, 40-watt 
fluorescent tubes (one cool white and the other 
daylight), for lighting that approximated north sky 
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daylight. Walls were draped with black photographic 
muslin, and a shield of the same material was hung 
in front of the luminaire, so that light would not 
be reflected from the walls onto the colors and 
S would not be exposed to light coming directly 
from the source. The luminance of the gray card- 
board background of the colors ranged from 5.4 to 
8.3 foot-lamberts over the table top. 

In the other experimental booth, S viewed the 
colors singly on a “stage” mounted on the top of 
a table. The S was seated directly before the stage 
and viewed all colors through a 1.5-inch aperture 
cut in an 8.5 X 5.5-inch medium-gray mask (Federal 
Standard 36243) which was placed on the floor of 
the stage. The viewing distance was approximately 
28 inches and the angle of regard was 45 degrees 
from the horizontal. Lighting in this booth was 
restricted to the stage except for a small shielded 
lamp used by the experimenter (EZ). The source was 
housed in a fan-cooled fixture which was hidden 
from S’s view by a shield across the top of the 
stage. The lights in this fixture varied with the 
experimental conditions. This booth also was hung 
with black muslin draperies; the stage floor was 
covered with medium-gray cardboard and_ black 
velvet paper lined the sides and top. 


Procedure 


In all three experiments the task was composed 
of two parts: a training phase, during which two- 
digit response numbers were associated with the 
color stimuli; and an identification phase, during 
which absolute judgments were made for all colors 
using the two-digit numbers from 11 to 46 as identi- 
fying responses. Assignment of numbers to colors 
was strictly random and was carried out sepa- 
rately for each S so that any prior association or 
preferences would be balanced out. 

During the training phase, the following procedure 
was adopted. The S was first permitted to view all 
color-number combinations together in the display 
booth; he was instructed to scan them all freely 
and to learn as many of the associations as he could 
during a 10-minute period. Following this he was 
transferred immediately to the experimental booth 
where a paired-associates training trial was con- 
ducted (each of the colors was presented once in 
random order with immediate knowledge of results). 
He was then returned to the display booth for an- 
other free viewing session of up to 5-minute dura- 
tion following which another paired-associates trial 
was administered. This alternation continued until 
five paired-associates trials had been completed and 
the session was terminated for the day. The same 
procedure was followed during all succeeding ses- 
sions, and sessions continued until asymptotic per- 
formance was reached (a criterion of no improve- 
ment over a five-trial session was used to define 
the asymptote). 

On the day following the last training trials the 
identification session was undertaken. Review of all 
colors in the display booth was restricted to 5 
minutes, after which each stimulus was judged six 
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times without knowledge of results in the identifica- 
tion booth. Order of presentation was randomized 
Over two sequences (three repetitions of each color 
constituted a sequence), and a brief rest period was 
inserted between the two sequences. The six judg- 
ments of each color obtained from all Ss during this 
phase constitute the major data of all experiments. 


Subjects 


All Ss were male students enrolled at Ohio State 
University; they were paid for their participation 
at the rate of $1.25 per hour. Each S was tested for 
color vision using the Ishihara Pseudoisochromatic 
plates (Experiments I and II) or the Nagle color 
anomaloscope (Experiment III). All Ss in the first 
two experiments were color normal: 10 served in 
Experiment I and 16 in Experiment II. In Experi- 
ment III, the 8 Ss were all rated deuteranopic. 


Variables and Design 


Experiment I, The first experiment was not con- 
cerned with direct manipulation of any variables. 
Instead, it sought to determine how many of the 
36 selected colors could be identified absolutely with 
few or no confusions under good, standard lighting 
conditions. Both the 10 Ss and the illumination 
conditions, therefore, were chosen to be color normal. 

One major objective of this study was the deriva- 
tion of an empirically optimum set of colors from 
the 36 originally chosen.4 The approach adopted was 
that of confusion matrix analysis. Briefly, this in- 
volved tabulation of all responses made to all stimuli 
(that is, generation of a confusion matrix) and the 
elimination of those stimuli responsible for the most 
frequent confusions. The stimulus subset derived 
using this procedure was to be validated in the 
two succeeding experiments. 

Experiment II, The stimulus subset derived in 
Experiment I was studied under four different 
lighting conditions: 


1. North sky daylight of 5-foot-lambert luminance 
(identical to Experiment I). 

2. Tungsten filament bulbs with nonselective filters 
producing a luminance of 5 foot-lamberts. 

3. A single warm white fluorescent tube producing 
a luminance of 5 foot-lamberts. 

4. North sky daylight of -.10-foot-lambert lumi- 
nance. 


The first three lighting conditions are considered to 
be representative of a variety of visual task environ- 
ments; the fourth represents the lowest practical 
level at which printed matter can be read. The 
normal daylight condition (No. 1 above) served to 
validate the selection procedure of Experiment I; 
to the extent that the colors chosen from the 
confusion matrix were identified accurately here, 
it could be concluded that the subset was indeed 
an optimum one. The other three conditions were 


*It was not only important to know how many 
stimuli could be identified, but also which ones. 
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Stimulus (Federal Standard No. 595)” 
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Fic. 1. Results of Experiment I expressed in an S-R confusion matrix. (Numbers in the cells indicate the 
frequency with which specific responses were given to specific stimuli.) 


included to ascertain how many (and which) of 
the selected colors could be identified even under 
less adequate—but nonetheless common—illumination 
conditions. 

It should be noted at this point that lighting 
conditions were varied only in the experimental 
booth; north daylight was always approximated in 
the display booth. Because of this difference, it was 
necessary to provide a period for visual adaptation 
when Ss were transferred from the display booth 
to the experimental booth. 

Each of four groups of Ss served under all condi- 
tions but in a different sequence so as to balance 
order effects. Only one lighting condition was 
experienced by any group in any given session. 

Experiment III, Only one aspect of this study 
distinguished it from Experiment II: Ss, instead 
of having normal color vision, were all deuteranopes. 
Thus, the primary issue was how many, and which, 
of the selected colors would be identifiable even by 
deuteranopes under the four common lighting condi- 


tions described in Experiment II. Again, all Ss 
(eight) served under all conditions in a balanced » 
order. 


RESULTS 


Experiment I, A stimulus-response con- 
fusion matrix was prepared for the 36 colors 
in the original set, and a reproduction of this 
matrix appears in Figure 1. A total of 2,160 
judgments (60 for each color) are sum- 
marized in this figure. The most notable 
feature of these data is the infrequency of 
confusions: 10 colors were identified per- 
fectly, 11 more were confused on fewer than 
5% of the judgments, and only 7% of all 
presentations resulted in error. Even more 
important, 24 of the colors were confused 
with no more than one other color. This 
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TABLE 2 


PROPORTION OF RESPONSES IDENTIFIED INCORRECTLY 
IN EXPERIMENTS I aAnp II 











Subject 
Condition Color Deuter- 
normal anopic 
Normal daylight 0052 .0946 
Reduced daylight 1358 .2569 
Tungsten 0321 1412 
Fluorescent 0295 1782 





would suggest that with careful selection a 
subset of over 20 colors could be chosen 
which would be absolutely identifiable or very 
nearly so. 

In order to observe more clearly the total- 
ity of the confusion pattern, a graphic 
representation of each color and its relation- 
ship to all other colors (that is, number and 
direction of confusions) was prepared. From 
this diagram it was possible to isolate those 
colors which demonstrated the greatest 
amount of response independence and, there- 
fore, could be expected to constitute an 
optimum subset. This apparently cumber- 
some approach was adopted because more 
traditional methods (for example, equal- 
discriminability scaling) are not readily ap- 
plied to stimuli varying in three dimensions. 
The result of this graphic analysis was a 
subset of 24 colors none of which was in- 
volved in more than one confusion, and only 
two of which were confused at all with others 
in the subset (some of them had, of course, 
been confused with the 12 colors excluded 
from the subset). The 24 colors comprising 
this empirically optimum subset are indicated 
by asterisks in Table 1. 

Experiments II and III. The proportion of 
identification responses which were in error 
under the four lighting conditions are sum- 
marized in Table 2 for both normal (Experi- 
ment II) and deuteranopic (Experiment III) 
Ss. It is apparent that performance is best 
under north sky daylight of reasonable lumi- 
nance and worst under north sky daylight of 
reduced luminance. There seems to be little 
difference between the tungsten and fluo- 
rescent conditions although both of these are 
markedly inferior to normal daylight. Sepa- 
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rate analyses of variance for normal and 
deuteranopic Ss supported these conclusions: 
both F ratios were significant at p< .001. 
Furtherfore, ¢ tests applied to all pairs 
of conditions revealed that only the tungsten- 
fluorescent difference failed to achieve signifi- 
cance (p> .05); all others exceeded the 
p < .01 level. 

The differences between normal and deuter- 
anopic Ss were of such great magnitude that 
statistical tests were not conducted; no useful 
purpose would have been served by such tests 
in any event, since it is obvious that deuter- 
anopes should not perform as well as normals. 

Of particular interest is the performance 
of normal Ss under normal daylight, since 
this condition provides a validation for the 
selection procedures of Experiment I. In a 
total of 2,304 identifications only 12 con- 
fusions occurred, and on only two occasions 
did confusion frequencies exceed 1 (in both 
cases the frequency was 2). Therefore, it can 
be concluded with reasonable assurance that 
color-normal Ss can identify 24 colors abso- 
lutely, even when these colors are drawn 
from a population as limited as the Federal 
Standards System. 

An examination of the kinds of confusions 
made under the various suboptimum lighting 
conditions by color-normal Ss indicated that 
(a) tungsten and warm white fluorescent il- 
lumination caused a large number of con- 
fusions among reds of low saturation; (b) 
reduced daylight illumination produced con- 
fusions on greater than 5% of the judgments 
for 16 of the 24 stimulus colors; and (c) the 
most frequent confusions obtained under 
tungsten and warm white fluorescent were 
generally the same as those obtained under 
reduced daylight (there was only one color 
for which confusions were low under reduced 
daylight and high under the other two 
conditions). 

As might be expected, the deuteranopes en- 
countered greatest difficulty in judging the 
reds and yellow-greens,. particularly among 
colors in which brightness and saturation dif- 
ferences were slight. Under normal daylight 
conditions only 8 of the 24 stimulus colors 
were identified with less than 3% error. 

The confusion data for Experiments II and 
III were analyzed graphically as in Experi- 
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ment I to facilitate the selection of optimum 
color subsets for the various conditions of the 
experiment, This procedure was deemed valid 
by virtue of the success with which Experi- 
ment I selections predicted identification per- 
formance in Experiment II (normal daylight 
condition). The specialized subsets of greatest 
practical significance are presented in Table 3. 
It will be noted that data from all three 
experiments are represented in this table. 

Selection of subsets in the present experi- 
ments was guided by more stringent criteria 
than any reported heretofore: the objective 
was to include only colors which were never 
confused with each other. As a result, all the 
subsets given in Table 3 with one exception 
(the 24-color subset derived from Experi- 
ment I) represent theoretically perfect identi- 
fication. Even for the 24-color subset it is 
reasonable to claim theoretically perfect 
identification, since two errors in a total of 
1,440 judgments could well be attributed to 
response or recording operations rather than 
to perceptual judgment. In all, 24 colors are 
regarded as acceptable for color-normal Ss 
under normal daylight conditions; 15 are ac- 
ceptable for color-normal Ss under all 5-foot- 
lambert conditions; 10 are acceptable for 
color-normal Ss under all lighting conditions; 
10 are acceptable for deuteranopic Ss under 
all 5-foot-lambert lighting conditions; and 8 
are acceptable for deuteranopic Ss under all 
lighting conditions. 


DISCUSSION 


Absolute identification capacities, In spite 
of the limited scope of the Federal Standard 
System, the number of colors that Ss were 
able to identify in these studies was sur- 
prisingly large. As noted earlier, the con- 
sensus of earlier laboratory work, using more 
extensive populations of colors, suggested that 
color-normal Ss are capable of judging abso- 
lutely no more than 15 colors. Under reason- 
able lighting conditions, such Ss were able to 
judge 24 correctly in the present work. 

Two factors are believed to account for this 
discrepancy. First, identification data were 
not collected in the present experiments until 
S had demonstrated, through his training 
performance, that he was no longer learning 
to associate the labels (numbers) with the 
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TABLE 3 


SUBSETS OF FEDERAL STANDARD CoLors NEVER 
CONFUSED UNDER VARIOUS SPECIAL CONDITIONS 


























: All 
Normal subjects only saijects 
oe Daylight 
andar : 
Pee eta cases 
~~"! (5 ft.-L.) | fluorescent a” 
(5 ft.-L.) 
32648 x Xx Xx x 
31433 x 
30206 x x x x 
30219 x 
30257 x x 
30111 xX xX 
31136 x xX Xx x 
31158 x 
32169 x 
32246 x x 
32356 x xX 
33538 x x x x 
33434 Xx 
33695 x 
34552 x x x 
34558 x 
34325 xX xX x x 
34258 x x x 
34127 x 
34108 x x x x 
35189 x 
35231 x x x xX 
35109 x x 
37144 x Xx x xX 
N 24 15 10 8 














colors. It is quite possible that the tradi- 
tionally low estimate of the human’s capacity 
for identifying colors reflects not only per- 
ceptual limitations but inadequacies of re- 
sponse learning as well. The study reported 
by Hanes and Rhodes (1959) certainly seems 
to support this contention, although lack of 
experimental rigor (for example, a single S, 
variable training procedures, liberal identifi- 
cation criteria) seriously limits the interpreta- 
bility of their results, Using the Munsell 
system, these investigators reported that im- 
provement continued over months of train- 
ing to the point that S could identify 50 
colors with “almost perfect accuracy.” Almost 
perfect, in this case, was about 5% error 
which is well above that tolerated in the 
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present work. It is doubtful, of course, that 
all of this improvement can be attributed to 
response learning; perceptual judgments, per 
se, have also been observed to improve with 
practice (Gibson, 1953). Still, however, it 
is likely that response learning contributed 
heavily to at least the early part of the func- 
tion in the Hanes and Rhodes study, and that 
failure to assure such learning is in large 
measure responsible for poor performance in 
other studies. 

The second explanation for the discrepancy 
in number of identifiable stimuli is that dif- 
ferences occurred along all three dimensions 
of color in the present work, Earlier studies 
have been limited primarily to variations in 
hue and saturation. In view of the fact that 
the Federal Standard System is also limited 
in scope, albeit for different reasons, it would 
be difficult to attribute the discrepancy in 
results to this factor alone. 

Coding recommendations. A widely ac- 
cepted principle in human engineering prac- 
tice is that when color coding must be used, 
it should be limited to no more than eight or 
nine colors. Even this number is regarded as 
excessive if viewing conditions are likely to 
be less than ideal, the colors are not carefully 
selected, or color-deficient viewers are apt to 
be involved. In sharp contrast, the present 
data suggest that under reasonable lighting 
conditions as many as 15 Federal Standard 
colors can be identified without error, and 
probably as many as 24 if the viewers are 
trained in identification. In fact, eight colors 
can be identified even by deuteranopes under 
the minimal conditions for color vision. 
Clearly, this calls for some revision in the 
recommendations for use of color in coding 
applications. 

A final point of some practical significance 
is found in the comparison of data ob- 
tained under optimum and suboptimum light- 
ing conditions. Clearly, color rendering is 
hampered by such common illuminants as 
tungsten-filament bulbs and warm white fluo- 
rescent tubes. The fact that these effects are 
peculiar to certain colors rather than charac- 
teristic of the entire set permits the develop- 
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ment of sizable—yet efficient—codes merely 
through judicious selection. It is also fortu- 
nate that many of the colors which are most 
susceptible to confusion under one suboptimal 
lighting condition are also the poorest under 
others. This fact has enabled the authors to 
specify a limited number of color subsets for 
which optimum results can be expected under 
wide ranges of potential coding applications 
(see Table 3). Although the ultimate utility 
of these subsets for color coding can only be 
determined through further validation, the 
present results strongly suggest that the 
potential of color for coding has long been 
underestimated. 
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MANAGERS’ ATTITUDES TOWARD HOW THEIR PAY 


IS AND SHOULD BE DETERMINED 


EDWARD E. LAWLER, III? 


Yale University 


This questionnaire study investigated the perceptions of 563 managers toward 
how their pay is determined and their attitudes toward how their pay should 
be determined. The results showed that in general the managers’ perceptions 
of how their pay was determined reflected the way their pay was, in fact, 
determined. However, the way their pay was determined did not appear to 
influence strongly their attitudes toward how their pay should be determined. 
There was general agreement among the managers that merit should be the 
most important determinant of their pay. However, attitudes toward what 
factors should be important in determining pay were shown to be related to 
the managers’ perception of their relative standing on the various factors. 
There was a positive correlation between how well the managers felt they 
compared with other managers on each factor and how important they felt 
the factor should be. The data also showed that there was a tendency for 
lack of congruence between a manager’s attitudes toward how his pay should 
be determined and how it is determined to be associated with high dissatis- 


faction with pay. 


The law of effect states that behavior 
which seems to lead to a reward will tend to 
be repeated. This law forms a theoretical basis 
for tying pay to productivity in the hope of 
producing better job performance. An equally 
strong theoretical basis can be found in the 
motivational model developed by Vroom 
(1964) which stresses the importance of con- 
sidering the workers’ perception of the rela- 
tionship between pay and performance, Many 
managers have assumed that by using pay 
plans that are called incentive plans and that 
purport to tie pay to productivity, the law of 
effect will operate in such a way as to increase 
the productivity of their workers. Literally 
hundreds of pay plans have been tried in the 
hope of finding one that is optimum in terms 
of producing good job performance. Ignored 
in this rush to employ different pay plans has 
been the systematic study of what determines 
employees’ perceptions of how their pay is 
determined, This seems particularly surpris- 

1The author would like to thank C, Argyris, L. 


W. Porter, and D. W. Taylor for their helpful com- 
ments on an earlier version of this paper. 


ing when it is noted that the evidence indi- 
cates that rewards are maximally effective 
when there is a direct connection perceived 
between the behavior and the reward (Mc- 
Geoch & Irion, 1952). Thus, a key element in 
determining the success or failure of any 
incentive pay plan may be the perception of 
the employees under the plan of how their 
pay is determined. For example, Georgo- 
poulos, Mahoney, and Jones (1957) found 
that better performers saw a stronger rela- 
tionship between pay and performance than 
did the poorer performers, indicating that 
seeing a connection between pay and _per- 
formance may well lead to good job perform- 
ance. 

There have been a number of case studies 
(e.g., Whyte, 1955) that have been concerned 
with the degree to which blue-collar employ- 
ees see their pay determined by their job 
performance. In general, these studies indicate 
that many of the pay plans that are called 
incentive plans by management are not seen 
as incentive plans by the workers, However, 
these studies have not established the degree 


273 


274 


to which different types of pay plans produce 
the perception among workers that pay is tied 
to productivity. Almost entirely missing have 
been studies concerned with understanding 
managers’ perceptions of the pay plans to 
which they are subject. One study by Chal- 
upsky (1964) did find that in a group of re- 
search organizations only 67% of the scien- 
tists said merit salary increases existed de- 
spite the fact that management claimed they 
were present in all the organizations. This 
finding would seem to indicate that not only 
at the blue-collar level in organizations are 
there situations where employees do not see 
a plan which they are subject to as an in- 
centive plan, while higher management con- 
siders it to be an incentive plan. It would 
seem to be particularly important to under- 
stand what influences managers’ perceptions 
of how their pay is determined. Among blue- 
collar workers, union contracts and collective 
bargaining have restricted management’s op- 
portunities to change pay plans. However, at 
the management level, organizations still have 
the opportunity to change their pay plans in 
ways that will encourage the perception that 
higher job performance will result in higher 
pay. 

Studies done at the blue-collar level on 
incentive pay programs have also shown that 
if the employees do not consider the basis 
upon which pay is determined to be legiti- 
mate, they exhibit resistance that frequently 
leads to the failure of the programs. Klein 
(1965) has suggested that an employee’s sat- 
isfaction with his pay is determined by an 
interaction between both how he feels his 
pay is determined and how he feels it should 
be determined, high satisfaction appearing 
where congruence exists between attitudes 
toward how pay is determined and how it 
should be determined. Therefore, in the pres- 
ent study it was decided to focus upon mana- 
gers’ attitudes toward how their pay should 
be determined as well as their attitudes to- 
ward how their pay is determined. By doing 
this it is hoped that some understanding can 
be gained of what factors influence managers’ 
attitudes toward how their pay should be 
determined, and of the relationship between 
attitudes toward how pay should be deter- 
mined and satisfaction with pay. 
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Specifically, the present study was con- 
cerned with the following questions: 

1, Are differences in the actual degrees to 
which pay is tied to productivity reflected in 
managers’ perceptions of how their pay is 
determined? In managers’ perceptions of how 
their pay should be determined? 

2. Are managers’ self-ratings of their stand- 
ing on factors that determine pay related to 
their perceptions of how their pay is deter- 
mined? To their perceptions of how their pay 
should be determined? 

3. Is there a tendency for high congruence 
between how a manager feels his pay is and 
should be determined to be associated with 
high satisfaction with pay? 


METHOD 
Questionnaire 


The questionnaire used consisted of five parts 
that are relevant for the present study; data from 
other parts of the questionnaire were reported in a 
previous article (Lawler, 1965). Part 1 asked the 
managers to rate seven factors on the basis of how 
important their organizations considered them in 
determining their pay. The ratings were made on a 
7-point scale ranging from 1 (unimportant) to 7 
(important) that followed each item. The following 
factors were presented: (a) Length of service, (b) 
Education and experience, (c) Amount of responsi- 
bility and pressure, (d) Quality of job performance, 
(e) Productivity on the job, (f) Effort expended on 
the job, (g) Scarcity of skills in the labor market. 

Part 2 of the questionnaire asked the managers to 
rate the same seven factors again. However, this time 
the managers were asked to rate them on the basis 
of how important they should be in determining 
their pay. Again the ratings were made on a 7-point 
scale ranging from 1 (unimportant) to 7 (impor- 
tant). 

Part 3 of the questionnaire asked the managers to 
consider the same factors for a third time. However, 
this time the managers were asked to rate the 
factors on the basis of how well they compared on 
each factor to other managers in their organization 
with similar management duties. These ratings were 
made on a scale running from 1 (low) to 7 (high). 

Part 4 of the questionnaire was designed to meas- 
ure the managers’ satisfaction with their pay. The 
managers were asked to make two ratings relative 
to the pay that they received for their management 
positions, First, they were asked to rate on a 1 
(minimum) to 7 (maximum) scale how much pay 
they presently received for their management posi- 
tion. Second, they were asked to rate on a similar 
scale how much pay they should receive. Satisfac- 
tion was measured by comparing the manager’s 
answer to part (a) with his answer to part (b). 
When (b) exceeded (a) it was considered that the 
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vanager felt that he received too little pay. The 
dvantages and rationale for using this question and 
leasure were discussed previously by Porter (1961). 
Part 5 of the questionnaire asked the managers to 
.dicate their salary, age, management level, senior- 
y, and education level. These answers were checked 
sainst company records where possible, and typi- 
ally proved to be accurate. 

The questionnaires were distributed individually 
either by United States or company mail) to the 
1embers of management in each organization. Each 
uestionnaire was accompanied by a letter from the 
hief officer of the plant or division studied that 
rged the manager to complete the questionnaire. 
‘he questionnaires were numbered in order to 
lentify the respondents, but each manager was 
ssured that his responses would be confidential. 
Jong with the questionnaire and letter, each re- 
pondent received a stamped, self-addressed enve- 
ype in which to return the completed questionnaire 
irectly to the university. 


Ranking Form 


A ranking form was distributed to the superior of 
ach manager who completed the attitude question- 
aire. The superiors were asked to rank order a 
roup of at least three of their subordinates on two 
actors, how well they were performing their job 
nd how much effort they put forth on their jobs. 
‘he ranking was done by the superior and the 
orms were returned directly to the university. The 
anks were then converted to standard scores for 
he data analysis. This was accomplished by com- 
uting the percentage of position of each rank (per- 


entage of position = 100 (ePID EES and 
hen converting this figure into its standardized 
core equivalent by using a table of standardized 
core equivalents of percentile ranks in a normal 
listribution. This process assumes that the mean 
yerformance levels of the groups ranked by differ- 
nt managers are roughly equal. To the extent that 
his assumption is not met there will be a tendency 
or the relationships between performance and other 
actors to be reduced. No interrater reliabilities 
ould be computed for these rankings, but the 
orrelation between the two rankings made by the 
ame manager was of a size (r=.59, p< .001) that 
vould indicate that the managers did discriminate 
yetween the two rankings and that they did not 
espond randomly to either one. 


Research Sites 


The present study was carried out in seven organi- 
‘ations, three of which were divisions of state gov- 
smments and four of which were private companies. 
[he government organizations differed widely in 
heir functions; one ran liquor stores, another ran 
inemployment offices, and the third was in charge 
»f conservation for the state. The four private 
srganizations also differed widely in their activities. 
One was a food processor, another was a chemical 
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manufacturer, the third was in the aerospace indus- 
try, and the fourth was a public utility. 


Pay Programs of the Organizations 


The three government organizations studied were 
all under civil service compensation systems and, 
thus, had similar compensation programs. Their com- 
pensation programs had established pay ranges for 
each job and virtually all holders of similar jobs 
received the same amount of pay. Merit seemed to 
be an important factor in determining pay, only 
because it was considered when promotions were 
due. Because of the great similarity among the pay 
policies of these government organizations, it was 
decided to combine the data from them and treat it 
as a single government sample. Table 1 presents the 
correlations between the amount of these govern- 
ment managers’ pay and their seniority, education 
level, job level, rated job performance, and rated 
effort expended. These correlations tend to support 
the point made above that job level and perhaps 
education rather than merit are the key determinants 
of their pay. 

The four privately owned organizations had com- 
pensation programs that were roughly comparable 
to each other. As was true with the government 
organizations, each job had a pay range. However, 
managers’ salaries covered the full range of the 
salaries possible and according to the personnel 
officers, merit was important in determining where 
an individual fell within this range. Because the four 
organizations had roughly similar compensation 
programs, programs that provide a clear contrast to 
those in the government sample, it was decided to 
combine the data from them and treat it as a 
single private industry sample. Table 1 shows that 
as expected for this sample, job performance is 
more strongly related to pay than it is in the gov- 
ernment sample. However, even for this group, job 


TABLE 1 


PEARSON PRODUCT-MOMENT CORRELATIONS BETWEEN 
THE MANAGERS’ PAy AND FACTORS THAT 
DETERMINE PAY 














Actual pay 
Factor Private Govern- 
industry ment 
sample sample 

(N =326) | (N =237) 
Seniority —.17** a2 
Education level sO} = PaO aes 
Management level E987 a .60** 
Quality of job performance* PAS 01 
Effort expended* .14* .00 





a Correlations computed separately for three management 
levels and then averaged. 
p <.05. 
kp < .01. 
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level and education appear to be the most significant 
determinants of the managers’ pay. The low correla- 
tions between pay and performance can undoubtedly 
be partly accounted for by the unreliability of the 
performance measure. However, the correlations are 
so low that it is felt that the true relationship be- 
tween pay and performance is not likely to be very 
high. 


Sample 


All seven organizations participating in the study 
had high response rates. For the private industry 
sample the response rate was 86.5%, while the re- 
sponse rate for the government sample was 91.5%. 
Overall 563 out of 635 questionnaires (88.7%) were 
returned. Table 2 presents the demographic data 
characteristics of the government and private in- 
dustry samples. There is a high degree of similarity 
between the characteristics of the two samples. This 
high degree of similarity has important implications 
for any attitude differences that might be found 
between the two samples. Because of the similarity, 
it is possible to rule out these variables as causal 
factors for any differences that are found between 
the two, making it more likely that the differences 
are due to the pay plans to which the managers are 
subject. 


RESULTS AND DISCUSSION 


Table 3 presents the mean ratings on im- 
portance in determining pay given each of the 
seven factors by the government sample and 
by the private industry sample. The data 
show that there is a consistent tendency for 
the managers in the private industry sample 
to rate all of these factors as more important 
in determining their pay than do the mana- 
gers in the government sample. On the basis 
of this evidence, it would appear that the 
government managers did not feel that these 
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TABLE 2 


CHARACTERISTICS OF THE SAMPLE BY 
TYPE OF ORGANIZATION 














Type of organization 
Factor 
Private | Govern- Total 
ment 
N lower management* 162 112 274 
N middle manage- 164 125 289 
ment? 
M age (years) 44.3 46.0 45.0 
M seniority (years) 18.0 16.0 17.2 
WM time in position 4.1 4.5 4,3 
(years) 
Education level: 
Percentage having 65.0 78.1 70.5 
beyond high 
school degree 
Average annual salary | 11,900 10,535 115325 
(dollars) 











« Lower management is defined as those managers who are 
on the lowest level of management in the organization and who 
generally were first-line supervisors, although typically not 
blue-collar foremen in the case of the present sample. 

b Middle management is defined as consisting of those posi- 
tions above the first level of supervision but below the vice- 
ee ea uee company officer, or major departmental head 
evel. 


seven factors represented all the important 
factors that determine their pay. This suppo- 
sition is supported by the fact that a number 
of the government managers wrote on their 
questionnaires that the action of the state 
legislature was the key factor in determining 
their pay. 

It is important to note that the largest 
differences between the two samples came on 


TABLE 3 


MEAN ScoRES ON THE IMPORTANCE SEEN ATTACHED TO FacToRS IN DETERMINING PAY AND ON THE 
IMPORTANCE THAT SHOULD BE ATTACHED TO THE FACTORS 























Is determined Should be determined 
Factor M M | M M 

private govern- Diff. t private govern- Diff. t 

industry ment industry ment 

sample sample sample sample 
Seniority 3.83 3.32 Ou meS 405 3.94 3.90 .04 .26 
Education and experience Dion SZ 398 | ei 9ee 5.70 Hii =107\\=778 
Responsibility level 4.90 4.54 36 | 2.40** 5.94 5.99 —.05 | —.50 
Quality of performance 5.59 4.55 1.04 | 8.00** 6.34: 6,22 A2uieleeo 
Productivity 5.42 4.15 NEPAL | Mf 6.16 5.74 42 | 4.10** 
Effort expended 4.85 3.80 LOSme2oe% 5.29 4.99 .30 | 2.00* 
Scarcity of skills 3.93 3.42 ot | 3.40** 4.16 4.15 01 07 
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ie three items (quality, productivity, and 
fort) designed to measure the perceived im- 
ortance of job performance in determining 
ay. As expected, the managers in the private 
dustry sample felt that their job perform- 
1ce was much more important in determin- 
ig their pay than did the government mana- 
ars. On these three items, for example, about 
5% of the private industry sample said 
ese factors were more important than did 
le average government sample manager. On 
1e basis of the evidence presented earlier, 
is would appear to be a fairly realistic 
icture of the two pay programs. One implica- 
on of this finding is that if organizations will 
ake an effort to base managers’ pay on 
erit, it is likely that they can produce the 
erception among their managers that they 
re being paid on the basis of merit. In the 
ght of the low correlations found between 
uted job performance and pay in both sam- 
les (r = .01 for government and rv = .20 for 
rivate), it is rather surprising that the mana- 
ers rated the job-performance factors as 
ighly as they did. The job-performance 
ictors and the education-level factor were 
ited as most important by both samples. 
pparently, when. given some reason to 
elieve performance is important, managers 
re willing to accept the view that they are 
aid on the basis of merit. 

A measure of the perceived importance of 
erformance factors in determining pay was 
omputed for each manager by summing the 
nportance the manager saw attached to each 
f the three performance factors. In both the 
overnment and the private industry samples 
aere was a significant correlation between 
nis measure and the superiors’ rankings of 
ow hard the managers worked (7 = .24, p < 
)1 for private, and r = .23, p < .01 for gov- 
rnment). As expected from Vroom’s (1964) 
neory and the law of effect, working harder 
ras associated with seeing a close connec- 
ion between pay and performance. This 
nding adds further support to the point 
nat organizations should encourage the 
erception that good performance leads to 
igh pay. Future research might profitably 
ocus on making finer discriminations about 
he kind of pay programs that lead to this 
erception. The present study makes the 
ather gross discrimination between civil serv- 
se practices and private industry practices; 
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future studies might consider differences that 
result from range systems with and without 
a control point, or from stock-option plans 
and bonus plans. 

Table 3 also presents the mean ratings 
given the seven factors by both samples on 
how important these factors should be in de- 
termining pay. In contrast to the managers’ 
responses to how pay is determined, there is a 
surprisingly high degree of agreement between 
the two samples on how important the differ- 
ent factors should be in determining pay. Ap- 
parently, the managers’ attitudes toward how 
pay should be determined are independent of 
the type of pay programs the managers are 
presently under. Both groups of managers re- 
ported that the quality of their performance 
should be the most important determinant of 
their pay. Undoubtedly, the social acceptabil- 
ity of this position served to inflate the mana- 
gers’ responses to this item. However, their 
responses still suggest that managers, at least 
in principle, are in favor of pay programs that 
are designed to reward merit. 

Table 4 presents the Pearson product-mo- 
ment correlations between the managers’ self- 
ratings on each of the seven factors and the 
importance they see attached to the same 
seven factors in determining their pay. In 
both samples, the correlations tend to be low. 
Apparently, the managers’ evaluations of 
how they stand on these factors has little im- 
pact upon how they perceive their pay is de- 
termined. This finding is in agreement with 
the point made above that the key determi- 
nant of how a manager feels his pay is deter- 
mined is the type of pay program he works 
under, It is interesting to note that the high- 
est correlation in both samples comes on the 
factor of market value of skills. It may very 
well be that for those individuals whose skills 
do have a high market value, this is an im- 
portant factor, while for those who do not 
have such a skill, it is an unimportant factor. 
For example, if a manager can program com- 
puters, this may be a very important de- 
terminant of his pay because of the scarcity 
of this skill; however, lack of the skill may 
have little influence upon his pay. 

Table 4 also presents the Pearson product- 
moment correlations between the managers’ 
self-ratings on each of the seven factors and 
their ratings on the importance they feel 
should be attached to the same seven factors, 
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TABLE 4 


CORRELATIONS BETWEEN MANAGERS’ SELF-RATINGS ON Eacu Factor AND THEIR ATTITUDES TOWARD 
How ImporTANT THE Factor Is AND SHOULD BE IN DETERMINING THEIR PAY 


























Is determined Should be determined 
Factor 
Private industry Government Private industry Government 
sample sample sample sample 

Seniority .09 —.09 oS Onc ya 
Education and experience oes 13 ies (2008 
Responsibility level RaSh .09 OOM o0re 
Quality of performance noi .16* aoa 40** 
Productivity .14* 293% A2** A0** 
Effort expended Phe sail 45** 46** 
Scarcity of skills on Soe A8** AS** 

ft pa—0S 

oR p< 01 


The correlations are all statistically signifi- 
cant and in general tend to be of moderate 
size. Thus, there is a general tendency for 
these managers to feel that those factors upon 
which they perceive themselves to be high 
should be the most important factors in de- 
termining their pay. This, of course, is not 
surprising and can be accounted for by what 
might be described as a “‘self-interest factor.” 
The fact that these correlations are not 
higher indicates that factors other than self- 
interest may influence managers’ attitudes 
toward how their pay should be determined. 
Judging from the data presented earlier, 
which showed some agreement among mana- 
gers that pay should be based upon merit, at 
least one other influence may be a generally 
accepted commitment to the principle of pay 
based upon performance. 

On several of the factors, it was possible to 
compute the correlations between the mana- 
gers’ actual standing on the factors and the 
importance they said should be attached to 
them in determining their pay. When this was 


done, the correlations were all lower than 
those found between the self-ratings and the 
ratings of how pay should be determined. 
For example, actual standing as measured by 
superiors’ rankings on quality of job per- 
formance correlated .06 in the government 
sample and .07 in the private industry sam- 
ple with the importance the managers felt 
these factors should receive in determining 
their pay. This result serves to emphasize the 
often made point that to understand attitudes 
and behavior, it is not enough to know the 
“facts of the matter”; it is necessary to know 
the individual’s perception of the facts. 

A comparison was made for each manager 
between how important he said each factor 
was and how important he felt it should be 
in determining his pay. By taking the abso- 
lute sum of these differences a congruence 
score was obtained for each manager. Both 
the government sample and the private in- 
dustry sample were then divided into two 
groups: those managers for whom there was a 
high congruence between how they said their 


TABLE 5 


MEAN Pay DISSATISFACTION UNDER Two LEVELS OF FEELING THAT PAY IS BASED UPON PERFORMANCE 
AND UNDER Two LEVELS OF FEELING IT SHOULD BE. 











Private industry sample 


How pay is determined 


Government sample 


Performance Performance Performance Performance 

should be low should be high should be low should be high 
Performance is high 1.23 (n= 71) 1.19 (n = 84) 2.19 (n = 49) 1.72 (n = 59) 
Performance is low 0.95 (n = 82) 1.25 (x = 88) 1.36 (n = 58) 2.39 (4 = 71) 
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TABLE 6 
ANALYSIS OF VARIANCE: MEAN Pay DISSATISFACTION SCORES UNDER Two LEVELS oF How 
IMPORTANT PERFORMANCE IS AND SHOULD BE IN DETERMINING Pay 
Private industry sample Government sample 
Source 
af MS F af MS F 

{s determined 1 1.64 1.28 1 58 aoe 
Should be 1 82 64 1 4.70 Dass 
Is X Should be 1 4.10 3.205 1 32.93 172907" 
Error | 321 1.28 233 1.84 

*p <.07 

py < .01 


pay was and how they felt it should be de- 
termined, and those managers who said there 
was low congruence. In both samples there 
was a significant tendency for the high con- 
eruence group to be better satisfied with their 
pay than was the low congruence group (¢ = 
5.11, 6 < .01 for government, and ¢ = 3.97, 
 < .01 for private). 

Table 5 presents the mean pay dissatis- 
faction scores of the managers using a 2 X 2 
classification system. The managers were first 
divided into high and low groups based upon 
the degree to which they indicated the three 
performance items were important in deter- 
mining their pay. These two groups were 
divided in half on the basis of the degree to 
which they said pay should be based upon 
performance. Table 6 presents the results of 
a 2 X 2 analysis of variance as recommended 
by Winer (1962) for data with unequal cell 
frequencies. In neither sample are the main 
effects significant. Thus, neither the degree to 
which a manager feels his pay is related to 
his performance nor the degree to which he 
feels it should be is directly related to his 
satisfaction with his pay. However, there is 
a significant interaction effect as expected. 
Those groups where a lack of congruence 
exists have the highest dissatisfaction scores. 
Thus, the impact of seeing pay based upon 
performance seems to be positive if the 
manager feels it should be, but negative if 
he feels it should not be. 

The evidence gathered on managers’ atti- 
tudes toward both how their pay is and 
should be determined seems to allow the 
general conclusion that among managers 
greater emphasis than now exists may profit- 
ably be placed on merit in determining pay. 
The data suggest that when organizations tie 


pay to performance the managers will see the 
connection and the data suggest that this 
will operate to increase performance. Further, 
the indication is that the concept of pay 
based upon performance is acceptable to most 
managers, If, as seems likely, the rather low 
correlations found in the present study be- 
tween actual pay and rated job performance 
are typical of what exists in other organiza- 
tions, then a great number of organizations are 
failing to use pay as a motivator of good job 
performance because they are not tying it to 
performance. In fact, it may be that by not 
tying pay directly to performance, organiza- 
tions may be contributing indirectly to the 
dissatisfaction of that group of managers who 
feel pay should be based on performance, a 
group that may well be in the majority. 
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JOB SATISFACTION AND TURNOVER IN A FEMALE 


CLERICAL POPULATION 
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Job-satisfaction questionnaires were administered to a sample of 350 female 
clerical workers employed by a large firm located in Montreal. After a lapse of 
5 mo. 31 girls had quit, 26 of whom had completed the questionnaire. These 
26 girls reported significantly less satisfaction with their jobs than the 319 girls 
who remained on the job. An explanation of this finding in terms of the 
difficulty of finding a new job, economic pressures to remain on present job, 
and condition of the labor market in Montreal is offered. The relationship 
between satisfaction and turnover is not regarded as general. The study was 
continued for 7 more mo. The data from the subsequent 7 mo. indicate that 
job-satisfaction scores continue to exhibit a significant relationship to turn- 
over over a 12-mo period. Even after a 12-mo period the terminators had 
reported lower job satisfaction at the time of the assessment than those who 


were still with the company. 


The relationship between job satisfaction 
and job behavior has been a source of dispute 
and controversy for several years. Herzberg 
(Herzberg, Mausner, Peterson, & Capwell, 
1957) concluded in his review of the literature 
that high satisfaction and high productivity 
went together while Katzell (1957), using a 
more stringent criterion of statistical signifi- 
cance concluded that the published studies 
did not reveal such a relationship. The con- 
clusions of Brayfield and Crockett (1955) and 
Vroom (1964) tend to support the position of 
Katzell. 

A second aspect of job performance—job 
turnover—appears to have generated some- 
what more consensus of opinion. Brayfield 
and Crockett (1955), Herzberg, Mausner, 
Peterson, and Capwell (1957), Katzell 
(1957), and Vroom (1964) all concluded that 
the published studies support the notion that 
the dissatisfied worker is more likely to leave 
his job than a satisfied worker. This finding 
and conclusion is theoretically appealing since 
one can expect that the job factors which 
lead a worker to like his job should be the 
same factors which lead him to remain on the 


1The author would like to express his apprecia- 
tion to Sonia Plourde for her assistance in this 
study, and to the officials and workers of the com- 
pany involved for their cooperation. Further infor- 
mation regarding the company may be obtained by 
contacting Charles L. Hulin, Department of Psy- 
chology, University of Illinois, Urbana, Illinois. 


job (see Vroom, 1964). The author (Hulin, 
1963), however, concluded that the presence 
of such a relationship might be too dependent 
on situational characteristics and character- 
istics of the work force to be regarded as a 
general finding. 

A careful analysis of the literature on the 
question of satisfaction and turnover indicates 
that only one of the studies deals with in- 
dustrial workers, uses individual reports of 
satisfaction, and uses individual termination 
decisions. This one study, by Weitz and 
Nuckols (1953), was done on a mailed ques- 
tionnaire basis on a sample of 1,200 life 
insurance agents. Weitz and Nuckols reported 
a 47% return of the questionnaires. The 
results indicated a .20 correlation (p < .05) 
between a direct measure of job satisfaction 
and subsequent survival as an agent and a 
.05 correlation between an indirect assessment 
of satisfaction and survival. Unfortunately, 
even in this study, there were two sources 
of bias: Significantly fewer of the terminators 
than the survivors returned the questionnaire 
and job survival and job performance are not 
independent variables in the case of insurance 
agents since they are paid on a straight 
commission basis. In this group of workers, | 
low production leads to termination. Weitz 
and Nuckols may have been predicting a 
combination of productivity and turnover. 

The writer does not question the validity 
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nor the rigor of the Weitz and Nuckols study. 
However, one could raise several questions 
regarding the generality of the conclusions of 
the reviewers who have generalized to all of 
industry on the basis of one study. 

The remainder of the published studies 
bearing on this question are only tangentially 
related to the issue. Many of the investigators 
have used group analyses (Fleishman, Harris, 
& Burtt, 1955; Giese & Ruter, 1949; Kerr, 
Kopelmeir, & Sullivan, 1951) and related 
average departmental or group satisfaction 
scores and group turnover rates. There is 
always the problem that different depart- 
ments, having different types of jobs, working 
conditions, and supervisors, will attract dif- 
ferent types of workers. These other vari- 
ables may be responsible for the obtained 
relationships between average departmental 
satisfaction and turnover levels. Several 
studies have used students or members of 
discussion groups or members of voluntary 
committees as subjects (Ss). While these latter 
studies are important, they do not answer the 
question of turnover and satisfaction in an 
industrial work force. 

It should be evident that satisfaction is 
only part of the answer to the problem of 
turnover. Other factors such as the condition 
of the labor market (Behrend, 1953), the age 
of the workers, chances of obtaining another 
job, and financial responsibilities all contribute 
to a worker’s decision to leave his job. While 
one might expect a relationship between turn- 
over and satisfaction in general, it is possible 
that in certain situations this relationship 
would not be obtained because of the factors 
mentioned above. 

The present study was carried out in 
order to obtain an indication of the general- 
ity of the hypothesized relationship between 
satisfaction and turnover. 


METHOD 


Research Setting 


The company involved in the research to be re- 
ported in this article is a large manufacturing com- 
pany with its home offices located in Montreal, 
Quebec. During the past 3 years the turnover among 
the female clerical staff has been 30.3%, 30.0%, and 
30.0%. The rate of turnover appears to be stable 
and has indicated no tendencies in either direction 
over the past 10 years. The company estimates by 
the use of a cost-accounting analysis that to hire 
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and train one clerical worker costs approximately 
$1,000. At the present rate, turnover in the clerical 
staff alone is costing the company $130,000 per 
year. Voluntary turnover rate among 15 other large 
companies in the Montreal area averaged 18.4%, 
20.2%, and 20.0% over the past 3 years. While labor 
market conditions may be contributing to the over- 
all level of turnover in Montreal firms this par- 
ticular firm seems to have more than would be 
predicted by the market conditions. This would indi- 
cate that a search for factors relating to individual 
turnover rates would be successful. 


Subjects 


The entire female clerical staff was asked to 
participate in a survey of job satisfaction which was 
to be conducted by the company. The workers were 
informed of this by their supervisors and by a letter 
from the personnel department. It was stressed that 
the company simply wanted to know what their 
clerical workers as a group thought of their jobs. 
They were told that the questionnaires were com- 
pletely anonymous and their individual responses 
would never be revealed to the company. Of the 
415 members of the clerical staff, 350 (86.3%) 
participated in the survey. (The largest percentage 
of the nonparticipants was either on vacation or 
ill during the testing period.) There was no ap- 
parent bias in the rate of participation between 
those who later quit and those who did not 
since 84% of those who quit had completed the 
questionnaires, 

In addition to the survey of the present staff, 
questionnaires were mailed to the 129 clerical workers 
who had quit during 1963. Twenty-nine of these 
questionnaires (22.5%) were returned. These Ss 
were asked to describe in retrospect how they had 
felt about the company as a place to work. 


Variables 


The job satisfaction of these girls was assessed 
by means of the Job Description Index (JDI). 
The JDI is a cumulative-point, adjective check-list 
type of scale. It has been subjected to an extensive 
validation program and has been described elsewhere 
(Hulin, Smith, Kendall, & Locke, 1963; Vroom, 1964, 
p. 100). The JDI was constructed to measure five 
separate aspects of a worker’s satisfaction: satisfac- 
tion with work done, with the pay, with promo- 
tional opportunities and policies, with the co-workers, 
and with the supervisor. A _ sixth, unvalidated 
scale constructed along the same general lines 
was added to the questionnaire for the purposes of 
this study. This sixth scale was an attempt to assess 
the workers’ reactions to the “atmosphere” of the 
company as a place to work. This added scale 
included such items as: friendly, everybody works 
together, helpful with personal problems, accept 
differences in cultural backgrounds, etc. This scale 
was intended to measure the workers’ reactions to 
some of the general aspects of the company. As 
such it probably will have a lower degree of dis- 
criminant validity than the original five scales, 
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In addition to these six satisfaction variables, 
measures were also obtained of each worker’s age, 
education level, job level (obtained by matching 
reported job title to the job-evaluation scale of 
the company), mother tongue, and marital status. 
All of the control variables were assessed by means 
of self-report. Measures of these six satisfaction 
variables and five control variables were obtained 
during the last week of June 1964 for each of the 
350 workers who participated in the survey. By 
December 15, 1964, 31 girls had quit, 26 of whom 
had participated in the original survey. The same 
questionnaire was readministered to these 31 girls 
at the time of quitting to obtain a measure of their 
attitudes toward their jobs after the decision had 
been made to quit. 


RESULTS 


The averages of the satisfaction scores and 
the control variables which had been obtained 
in July from the 26 terminators were com- 
pared to the averages of the 319 girls who 
were still on the job on December 15, 1964. 
These data are presented in Table 1. 

These data indicate that those who later 
quit their jobs reported less satisfaction in 
June than those who did not quit. They were 
also 8 years younger on the average. The 
major hypothesis of this study concerns the 
effects of satisfaction on turnover. Since age 
is positively related to certain aspects of job 
satisfaction the differences in age could have 
accounted for both the turnover and the dif- 
ferences in satisfaction levels between the two 
groups. Therefore, for every one of the 26 
terminators two control Ss were drawn who 
were matched in terms of age, years of 
education, and mother tongue. The average 
satisfaction scores of these 52 controls and 
the 26 terminators are presented in Table 2. 

No significant differences in any of the 
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TABLE 1 


CHARACTERISTICS AND SATISFACTION SCORES OF 
TERMINATORS AND NONTERMINATORS 





Nonter- | Termi- 
Variable minators | nators 
n = 319| n = 26 
Satisfaction area 
Work 35.87 28.69 
Pay 15.00 15.15 
Promotions 10.90 9.35 
Co-workers 41.13 37.40 
Supervision 41.81 38.15 
Atmosphere 34.78 32.92 
Sample characteristics 
Age 32.04 24.23 
Years of education 11.93 11.94 
Job level 5.86 5.33 
Percentage with English as 25 26 
mother tongue 
Percentage unmarried 25 31 


control variables were observed between the 
control group and the terminators. The sig- 
nificance of the difference between the vectors 
of mean satisfaction scores obtained from the 
26 terminators and the 52 matched controls 
was tested by means of Hotelling’s (1931) 
T° analysis. The difference was significant at 
less than the .05 level. It should be pointed 
out that the J? analysis assumes random 
groups and in this case one is dealing with 
groups that have been matched on variables 
known to be associated with the dependent 
variables in question. For this reason the 
error term may be too large and the test will 
be a conservative one. 

Column 3 of Table 2 gives the average 
satisfaction scores at the time of quitting 


TABLE 2 
MEAN SATISFACTION SCORES FROM ALL Groups OF WORKERS 








Controls Terminators Terminators 1963 
Variable (June) (June) (At eune of Terminators 
n= 52 n = 26 quitting) n = 29 
io = shill = 
Satisfaction area 
Work 35.83 28.69 30.48 28.28 
Pay 15307 15.15 17.94 16.07 
Promotions 17.16 9.35 10.10 10.72 
Co-workers 41.44 37.40 40.48 39,10 
Supervision 41.66 38.15 40.87 34.79 
Atmosphere 35.52 32.92 36.48 31.72 
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of the 31 girls who quit between June and 
December 15, 1964. These scores were gath- 
ered originally for general information pur- 
poses. No predictions were made regarding 
the relative level of the scores as compared 
to the June scores, No tests of the significance 
of the difference between this vector of means 
and the vector of means obtained in June 
were made. The Hotelling TJ? analysis is not 
applicable since it assumes different groups 
and six f ratios done on correlated variables 
would seem to be inappropriate also. Col- 
umn 4 presents the average satisfaction 
scores of the 29 girls who had quit during 
1963 and who responded to the mailed ques- 
tionnaire. Again, no predictions were made 
regarding this vector of means and no 
statistical analyses were done. 


FoLLOw-uP STUDY 


The company continued to record bio- 
graphical data of the workers who quit from 
the fifth to the twelfth month of the study. 
These biographical data were used to identify 
the questionnaires completed at the beginning 
of the study. Twenty-three girls quit during 
the period from December 15, 1964 to June 
15, 1965. Seventeen of these 23 had com- 
pleted the JDI the previous June. For each 
of the 17 identified workers who had termi- 
nated employment, two control Ss matched 
in terms of age, years of education, and job 
level were drawn from the total group. The 
average scores on the satisfaction and bio- 
graphical variables were computed for these 
two groups of Ss. 


RESULTS 


The results of this second analysis are 
shown in Table 3. The difference between 
the two vectors of means was tested for sig- 
nificance by means of Hotelling’s (1931) T” 
analysis. This test resulted in a nonsignificant 
statistic indicating that the difference be- 
tween these two vectors could reasonably be 
attributed to chance. 

As an additional analysis an unweighted 
sum of the satisfaction variables was cor- 
related with the turnover criterion. This 
Pearson correlation coefficient was —.28 
(p > .05). This latter correlational analysis 
merely supports the results of the T? analy- 
sis but it provides us with an estimate of 
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TABLE 3 
MEAN SATISFACTION AND BIOGRAPHICAL VARIABLE 
Scores FROM JUNE 1964: 
Terminators 
; (12/64 to | Controls 
Variable 6/65) ae 
ih 
Satisfaction area 
Work 31.41 35.85 
Pay 15.94. 16.38 
Promotions 11.12 13.76 
Co-workers 40.65 44.15 
Supervision 30.82 41.97 
Atmosphere*® 33.24 36.97 
Biographical variables 
Age 24.8 24.6 
Years of education 12.6 12.5 
Job level 5.9 OZ 
Percentage with English as 23 23 
mother tongue 
Percentage unmarried 35 il 








® This last scale, ‘‘atmosphere,’’ was not part of the original 
JDI. It was constructed by the writer especially for the pur- 
poses of this study and was an attempt to assess the extent 
the workers regarded the company as a friendly place to work. 


the magnitude of the relation between the 
satisfaction variables and turnover. A similar 
analysis of the data from the terminators and 
their controls of the first 5 months resulted 
in a Pearson correlation coefficient of —.26 
(p< 205). 

The two groups of terminators from the 
first 5 months and the last 7 months were 
combined and compared to the combined 
group of control workers (7 = 129). A com- 
parison of the satisfaction scores of these 
two groups indicated a significant T? analysis 
(p< .01), a multiple correlation of .34 
(p <..01), and a significant Pearson correla- 
tion between an unweighted sum of the satis- 
faction variables and the turnover criterion 
Of 27) (P.<.01): 


DISCUSSION 


The results of this investigation indicate 
quite clearly that subsequent termination can 
be significantly predicted from a knowledge 
of the worker’s job satisfaction in this sample 
of female clerical workers who are working 
in an area in which there is a labor shortage 
for clerical workers. This significant difference 
between the vectors of mean satisfaction 
scores for terminators and survivors held up 
both in the original analysis and after a 
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matched control group was drawn. This seems 
to imply that workers on jobs whose charac- 
teristics are satisfying to them (for whatever 
reason) are likely to remain on those jobs. 
Vroom’s (1964) analysis that job character- 
istics which lead to satisfaction also lead 
workers to remain in that situation appears 
to be correct for this sample of workers. It 
should be stressed that the writer is not con- 
cluding that these relationships will be ob- 
tained from all types of workers in all situa- 
tions. To make such a conclusion on the basis 
of only two studies would seem to be prema- 
ture. The generality of the results is still to 
be determined. 

At the present time it seems best to ascribe 
a low value to the probability that these 
results would generalize to a large segment 
of the United States work force. Several 
factors in the present study undoubtedly con- 
tributed to these results. The Ss were females. 
They have fewer economic reasons for re- 
maining on any job they are dissatisfied with 
than would a comparable sample of males. 
They tend to be a young group (average age 
of 32 years) and they are relatively well- 
educated (average of 12 years of education). 
They tend to have readily marketable skills 
and live in an area in which there is a demand 
for these skills. Therefore, any decision they 
make regarding job termination can be re- 
garded as being an easily made decision. They 
know that a new job can be obtained with a 
minimum of effort. There are few pressures 
which would tend to keep them in their 
present position if they are dissatisfied with it. 

On the other hand, workers who are less 
able to find a new job and who have a num- 
ber of economic obligations would have pres- 
sures on them to remain at their present job 
even if they were decidedly dissatisfied. It 
would be possible to postulate that there is 
a dimension of “propensity to leave if dis- 
satisfied” in the work force. At one extreme 
(high propensity to leave if dissatisfied) one 
would expect to find young, highly skilled 
workers, with few economic obligations, who 
are living in an area which has a demand for 
their skills. At the other extreme (no pro- 
pensity to leave if dissatisfied) one would 
expect old, unskilled workers, who live in 
an area of substantial unemployment. It 
would seem to be unwise to conclude at this 


time that satisfaction and probability of 
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termination are, in general, negatively related 
in the work force. The relationship would 
seem to depend on a great many factors. 

The vector of mean satisfaction scores 
given in Column 3 of Table 2 while not 
tested for significance is interesting. These 
means represent the average response of the 
26 terminators at the time of quitting. These 
26 girls report more satisfaction in all areas 
at the time of quitting than they reported 
in June prior to quitting. Why this should 
be so is not immediately obvious. These 
results appear to be somewhat at odds with 
Festinger’s (1957) theory of cognitive dis- 
sonance. The theory would hold that after 
these girls had reached their decision they 
would attempt to reduce any dissonance they 
may have felt by reporting less satisfaction. 
In the case of every one of the six satisfaction 
variables the mean score increased. While 
several post hoc explanations are possible this 
should probably be regarded as a regression 
to the mean effect until further replications 
have been attempted. 

The final set of data reported in this paper - 
is the mean satisfaction scores obtained from 
the mailed questionnaires which were sent to 
the girls who had quit during 1963. This 
sample of girls described (in retrospect) their 
jobs with the company in somewhat negative 
terms. The retrospective descriptions of jobs 
which these girls furnished appear_to re- 
semble the responses given by the terminators 
in June. However, since this is a cross- 
sectional comparison, any interpretation is on 
an unsound basis. The data are presented in 
the interests of hypothesis formulation, not 
hypothesis testing. 

The differences in satisfaction scores 
between terminators and nonterminators ap- 
pear to be small and, while they attain sta- 
tistical significance, there may be some doubt 
about their practical significance. However, 
it should be remembered that the company 
estimates that it costs in excess of $1,000 
to hire and train a new clerical worker. At 
this rate of cost the control of only 10% of 
the variance of turnover rates would become 
a very practical matter for many companies. 

The nonsignificant T° statistic and the non- 
significant correlation between the satisfaction 
variables and the turnover measure in the 
group of workers who quit during the last 
7 months of the study could be interpreted as 
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evidence that the predictive power of a set of 
satisfaction variables has disappeared after a 
period of only 5 months. An explanation of 
these discouraging results would likely be 
centered around the notion that intervening 
occurrences can have substantial effects on 
the job satisfaction of the workers. Even 
though the termination decisions were related 
to the workers’ job satisfaction at the time of 
quitting, their satisfaction at the time of 
quitting might bear very little relationship to 
their satisfaction at the time it was measured. 
This would be evidence neither for the lack 
of validity nor the lack of reliability of the 
JDI. It would be evidence for the lack of 
stability of job satisfaction itself. 

It should be pointed out, however, that 
even though statistical significance was not 
achieved, the magnitude of the relationship 
between satisfaction and turnover has not 
changed from the first 5 months to the last 
7 months. A comparison of the correlations 
between an unweighted sum of the satisfac- 
tion variables and turnover indicates this 
quite clearly. These correlations are —.26 for 
the early terminators (first 5 months) and 
—.28 for the late terminators (last 7 months). 
Further, the correlation of —.27 for the en- 
tire combined group cannot be said to be 
dependent on the existence of large differences 
in satisfaction between the early terminators 
and their controls since (to repeat) the magni- 
tude of this relationship has not changed over 
the course of this study. 

This degree of stability is regarded as both 
encouraging and surprising by the writer. It 
would be reasonable to expect that those girls 
who were extremely dissatisfied at the time of 
the administration of the JDI would be likely 
to quit shortly after the administration. That 
is, those who were most dissatisfied would be 
among the first to terminate; those who quit 
during the second and third months would 
have achieved higher scores on the JDI than 
those who quit first, etc. Thus, if one com- 
puted the mean June job-satisfaction scores 
for the groups of workers who quit during 
each of the subsequent months one would 
expect to find that these June scores exhibited 
a steady increase. Likewise, would be obtained 
a steadily decreasing validity for the predic- 
tion of turnover from satisfaction scores. An 
inspection: of the month-by-month means 
indicates that this is indeed what is happen- 
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ing. The increase is not large encough to 
appreciably affect the validity of the JDI, 
however. In spite of these factors, the JDI 
exhibited significant validities for the pre- 
diction of turnover over a 12-month period. 

One could infer from the results of this 
study that the JDI job-satisfaction scores 
possess a high degree of stability over time. 
Otherwise one could not achieve the long-term 
validities as was done. They would also dem- 
onstrate that the relation between attitudes 
and behavior is not a short-term transient phe- 
nomenon but can be expected to last for a 
considerable period of time (12 months in 
the case of satisfaction and turnover) and 
that valid predictions can easily be made 
during this period of time. 


REFERENCES 


BrHrenp, H. Absence and labour turnover in a 
changing economic climate. Occupational Psychol- 
ogy, 1953, 27, 69-79. 

Brayrierp, A. H., & Crockett, W. H. Employee 
attitudes and employee performance. Psychological 
Bulletin, 1955, 52, 396-424. 

Festincer, L. A theory of cognitive dissonance. 
Evanston: Row, Peterson, 1957. 

FirisHMaNn, E. A., Harris, E. F.. & Burtt, H. E. 
Leadership and supervision in industry. Columbus: 
Ohio State University, Bureau of Educational 
Research, 1955. 

Giese, W. J., & Ruter, H. W. An objective analysis 
of morale. Journal of Applied Psychology, 1949, 
33, 421-427. 

HeErzBerc, F.. Mausner, B., Peterson, R. O., & 
Capwe Lt, D. F. Job attitudes: Review of research 
and opinion. Pittsburgh: Psychological Service of 
Pittsburgh, 1957. 

Hotetriinc, H. The generalization of Student’s ratio. 
Annals of Mathematical Statistics, 1931, 2, 360- ~ 
378. 

Huttn, C. L. Research implications of attitude sur- 
veys in large organizations. Paper read at the 
Illinois Psychological Association, Springfield, Ili- 
nois, 1963. 

Huin, ©, Le osc, PoC. Kenparr, De My & 
Locke, E. A. Cornell studies of job satisfaction: 
II. Model and method of measuring job satisfac- 
tion. Unpublished manuscript, 1963. 

Karzett, R. A. Industrial psychology. Annual Re- 
view of Psychology, 1957, 8, 237-268. 

Kerr, W. A., Kopermetr, G.. & Surtivan, J. J. 
Absenteeism, turnover, and morale in a metals 
fabrication factory. Occupational Psychology, 1951, 
25, 50-55. 

Vroom, V. H. Work and motivation. New York: 
Wiley, 1964. 

Wertz, J., & Nucxots, R. C. The validity of. direct 
and indirect questions in measuring job satisfac- 
tion. Personnel Psychology, 1953, 6, 487-494. 


(Received March 4, 1965) 


Journal of a iked Psychology 
1966, Vol. 50, . 4, 286-291 


COGNITIVE ASPECTS OF PSYCHOMOTOR PERFORMANCE: 


THE EFFECTS OF PERFORMANCE GOALS ON LEVEL 
OF PERFORMANCE? 


EDWIN A. LOCKE anp JUDITH F. BRYAN 
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An experiment stemming from Mace’s work on the effects of performance 
standards on level of performance is reported. It was found that Ss given 
specific (but difficult) standards performed at a higher level on a complex 
psychomotor task than Ss told to “do their best,” thus replicating Mace’s find- 
ing with a computation task. In contrast to Mace’s study where performance 
goals worked by prolonging effort during the latter part of the work periods, 
the standards intensified effort at all stages of the work periods in the present 


case, 


A previous experiment by Mace (1935) on 
the effects of performance standards on level 
of performance found that subjects (Ss) given 
specific scores or standards of performance to 
beat on each trial (based on their initial 
ability) improved much faster on a mathe- 
matical computation task than Ss told simply 
to “‘do their best.”” The major purpose of the 
present experiment was to replicate this find- 
ing with a complex motor task. The task was 
the Complex Coordination task described 
previously by Melton (1947) and Fleishman 
and Hempel (1954). 

However, there were a number of inten- 
tional differences in procedure between the 
Mace experiment and the present one. First, 
Mace did not report just how hard the stand- 
ards were for the groups with specific stand- 
ards. The present investigator (Locke, 1966) 
has shown that the difficulty of reaching the 
intended level of performance (i.e., the ac- 
tual level of the standard) has a significant 
effect on performance: the higher the stand- 
ard the higher the performance. The stand- 
ards in these previous experiments ranged in 
difficulty from 93% (the percentage of trials 
on which Ss were able to beat them) to 4%; 
in other words, from “very easy” to “very 
hard.” In the present experiment the stand- 
ards were set at a moderately hard difficulty 
level, in this case such that Ss were able 


1 This research was supported by Contract No. 
Nonr 4792(00) between the Office of Naval Research 
and the American Institutes for Research. The 
opinions expressed are not necessarily those of the 
Department of the Navy. 


to beat them less than 30% of the time. Sec- 
ond, as Mace (1935, p. 20) suggests, Ss 
told to “do their best” could, if given their 
scores after each trial, set standards for them- 
selves even though they are not told to do so. 
In fact, in the present investigator’s experi- 
ence it is very difficult to stop experimental 
Ss from doing this especially where the “de- 
mand characteristics’ (Orne, 1962) of the 
situation are high. Usually an S not instructed 
to set goals (but given knowledge of his 
score) will set himself a goal of ‘constant 
improvement” or a specific score to beat that 
is considerably above his initial performance 
(i.e., a “long term” goal). In order to prevent 
this in the present experiment, Ss who 
were not given standards were also not given 
their total scores for each trial, though they 
were given knowledge of the correctness of 
their response sequences (see task description 
below). It is true that this group therefore 
lacked specific “knowledge of total score” 
which the group with standards had. How- 
ever, this knowledge was not knowledge about 
the correctness of individual responses or re- 
sponses sequences (this was given in the task 
itself), but knowledge about the number of 
correct responses made. The latter informa- 
tion could not give the Ss knowledge of how 
to perform the task better; it could only give 
them knowledge with which they could regu- 
late their level of effort. Payne and Hauty 
(1955) made this same distinction between 
these different kinds of knowledge in a pre- 
vious study. Mace (1935) has suggested that 
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the motivational effects of knowledge of re- 
sults are entirely a consequence of giving the 
Ss information with which to set themselves 
performance standards. Thus, it is argued in 
the present case that knowledge of total score 
and standards affect Ss cognitively in the 
same way, namely, by giving them information 
which they can use only to regulate their 
level of effort. Standards were introduced to 
insure that all Ss used their knowledge in the 
same way, thus the “confounding” of the two 
is not considered relevant to the primary pur- 
pose of the study which was to examine the 
effects of standards. 

Of secondary interest in the present study 
were the effects of a specific learning plan or 
strategy on Complex Coordination perform- 
ance. Previous research by Fleishman and 
Hempel (1954) had found Discrimination 
Reaction Time to be an important ability at 
the early stages of practice in this task and 
Simple Reaction Time to be important at the 
later stages. However, one pilot S who worked 
at the task for 5 hours indicated that he had 
eventually tried to learn to memorize and 
thus anticipate which pattern was coming. 
This S attained a very high level of perform- 
ance. It was not known, however, how soon 
Ss could begin to memorize the pattern se- 
quence successfully nor if all Ss could do it 
at all. It was thought that 1 hour’s practice 
was a minimum prerequisite for any attempt 
to memorize the patterns. The interest here 
was in whether trying to memorize the pat- 
terns would improve performance. 


METHOD 


Task. The Complex Coordination apparatus con- 
sists of two pairs of adjacent rows of horizontal 
lights separated by a pair of adjacent vertical rows 
of lights (so that the display looks like an H on its 
side). One row of each adjacent pair of rows con- 
sists of red lights and one of green lights. One red 
light in each row is illuminated at any given time 
to form a pattern (consisting of three red lights). 
The S’s task is to move a set of controls in order to 
illuminate a pattern of green lights to match the 
pattern of illuminated red lights. The controls con- 
sist of foot pedals that control the illumination of 
the bottom horizontal row of green lights and a 
“Joy stick” which moves laterally and forward and 
back to control the illumination of the top hori- 
zontal and the vertical rows of green lights, respec- 
tively. When the S “matches” the red-light pattern 
with a green-light pattern, the pattern of red lights 
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automatically changes to a new pattern. There are 
13 different patterns given in sequence. Actually the 
apparatus is programed so that after every 3 repe- 
titions of the 13 patterns (in the same order), Pat- 
tern No. 11 comes on before the cycle begins again. 
Thus, for all intents and purposes, it is a 13-cycle 
pattern “with complications.” Thus, feedback about 
the correctness of his movement sequences is given 
to S automatically, since the red-light pattern 
changes when it has been correctly matched. 
However, information as to the total number oi 
matches made during a given time period could be 
withheld from the S as necessary. The Ss without 
standards or knowledge of scores could still get 
some idea of how well they were doing by the rela- 
tive frequency with which the red-light patterns 
changed (with which they made successful matches). 
One S who was in the No Standard group actually 
counted the number of matches he made on his last 
trial, Another counted the number of matches on 
two different trials. 

Subjects. The Ss were 29 University of Maryland, 
paid, male volunteers who responded to an adver- 
tisement in the college newspaper. (One S was 
dropped from the analysis; see Results section.) 

Conditions. The design was a 2 X2 fixed model 
with 7 Ss in each cell. (a2) “Standard” condition— 
half the Ss were given specific performance goals or 
standards to beat on each trial. The standards for 
each coming trial were determined by adding a 
fixed increment to the S’s best previous score after 
each trial. The increment was 15 (matches) if the 
S’s 10-minute trial score was below 100, 10 if 
it was over 100 but under 130, and 5 if the pre- 
vious best score was over 130 (the reduced incre- 
ments being due to the fact that improvement was 
more difficult as S’s score became higher). The 
Ss were told that beating these standards consti- 
tuted “what we considered to be successful perform- 
ance on the task on the basis of our experience with 
the task.” The Ss were told that the standards rep- 
resented “above the average performance for colle 
students.” The Ss without standards were told at 
the beginning of the first experimental trial to “do 
their best” on every trial, and were not given their 
total scores nor any standards. (b) Plan condition 
—at the end of the halfway point in the experiment 
(i.e., after 1 hour’s practice), half the Ss were told 
to “try and memorize the number and sequence of 
the red light patterns.” They were told that this 
would improve their performance since they would 
be able to anticipate which patterns were coming, 
and, therefore, to respond faster. They were re- 
minded on each subsequent trial to continue to try 
and memorize the patterns. 

Procedure. The experiment was introduced as a 
study of the way in which motor skills develop and 
relate to each other. After preliminary testing on 
the Jump and Discrimination Reaction Time tests 
found by Fleishman and Hempel (1954) to predict 
performance on this task at the late and early 
stages, respectively, the functioning of the Complex 
Coordination task was explained to the Ss, and then 
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all Ss were given a 2-minute practice trial during 
which they were told to “do their best.” After this 
it was decided what condition to put each S in. In 
each case, this was decided only on the basis of 
the practice score and so as to equalize the prac- 
tice scores of the 4 experimental groups as much as 
possible. The Ss were told they would have 12 
trials of 10 minutes each, separated by a 2-minute 
rest period. At this point, the method of goal set- 
ting was explained to the groups with standards (the 
goal for Trial 1 was 5 X the practice score plus 15) 
and the remaining Ss were told to “do their best.” 
Nothing was said to the Plan groups at this point. 
After each trial the Ss with standards were given 
their score on that trial and their standard for the 
next trial. (Between trials all Ss made some ratings 
and described “what they were thinking about” but 
these data are not of relevance to the present ex- 
periment.) 

After Trial 6 all Ss were told that Trial 7 was to 
be an experimental trial during which they should 
“experiment with new ways of doing the task.” The 
Plan Ss, however, were told explicitly to use this 
trial to begin to memorize the red-light patterns. 
The No Plan Ss were told to do as they pleased. 
The Ss with standards were told no scores or stand- 
ards would be given on this trial. 

Before Trial 8 the Standard Ss were given their 
new goal based on their best performance before 
Trial 7. The Plan Ss were told to continue trying to 
memorize the patterns. 

At the end of the experiment Ss were given a 
questionnaire asking them (a) whether they tried 
to reach the goals or not (if applicable) and, if 
not, what goals they were trying for, and (b) how 
many of the patterns they had been able to mem- 
orize. This was checked by an actual recall test in 
which Ss had to reproduce as many of the red- 
light patterns (by marking Xs on a paper design) 
as they could. 


RESULTS 


Success of experimental manipulations. 
Since the true independent variables in this 
study were conceived of as “cognitive” 
rather than “situational,” it was necessary to 
determine whether or not the Ss followed in- 
structions about the goals they were asked to 
pursue. It was found that one S$ in the Stand- 
ard-Plan condition was not doing so at all, so 
he was dropped from the condition and re- 
placed with another S$. The decision to drop 
this S was not based on this answer alone. 
There were other pieces of evidence, that is, 
his response to another written question; spe- 
cific questions put to him by the experimenter 
(£) after the experiment; the S’s spontaneous 
comments during the experiment (e.g., claim- 
ing he was falling asleep and asking for stimu- 
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lants); observations of the S’s behavior dur- 
ing the experiment (bowing his head and 
almost falling asleep); and comments made 
by the S on a dictaphone between trials about 
his “thoughts during the trial.” It was only 
because all these pieces of evidence com- 
pletely substantiated his questionnaire re- 
sponse that it was felt justifiable to drop him 
from the analysis. In the case of other Ss who 
indicated that they were not fully following 
instructions, such “convergence of evidence” 
was lacking, thus they were retained. 

All remaining Ss in the Standard-Plan con- 
dition claimed to have tried to beat the stand- 
ards. Three of the seven Standard-No Plan Ss 
claimed they were not trying to reach the 
goals set by the #. However, one replaced 
these goals with his own goals which were of 
equivalent difficulty; another said he tried 
not to beat the standards by too much since 
he did not want them to go too high and tire 
him out; a third said he followed the goals 
at first, but did not later. It was felt that 
these Ss were trying for the goals (or equiva- 
lent goals) at least to some degree so that 
removing them was not justified. 

Nine of the 11 No Standard Ss who re- 
sponded to the question about goals indicated 
they were trying to “do their best” or some- 
thing similar. One claimed he was trying for 
gradual improvement and another’s answer 
was not interpretable. Three Ss did not re- 
spond to the question. (This was probably 
due to ambiguous instructions on the ques- 
tionnaire. ) 

All the Plan Ss indicated they had tried to 
memorize some aspect of the patterns, but 2 
indicated they emphasized recognition over 
anticipation, 4 said they tried mainly to mem- 
orize the bottom red-light (matched with the 
foot pedal) sequence and one the top red-light 
(matched with lateral stick) sequence. How- 
ever, 9 of the 14 Plan Ss said that memoriz- 
ing was too difficult and/or that it did not 
help them and (in some cases) actually hurt 
their performance. This suggests that the 
memorizing plan might have been introduced 
too soon in the learning process. 

In terms of the actual difficulty of the 
goals, the Ss in the goal condition were able 
to reach or exceed their standards on only 
29% of the trials, suggesting that the stand- 
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ards were quite difficult. In Locke’s (1966) 
‘experiment Ss who were given standards 
this hard or harder attained the highest out- 
‘put of any group. 

Effects of standards and plans. First, it 
‘should be noted that the four experimental 
‘groups were successfully matched on the basis 
of initial ability, as measured by scores on 
the practice trial (Pye = <1). The groups 
were also matched successfully on the Dis- 
crimination Reaction Time and Jump Reac- 
tion Time tests found previously by Fleish- 
man and Hempel (1954) to predict perform- 
ance at the early and late stages of practice 
on this task, respectively. 

The linear slope of the performance scores 
from the practice trial (multiplied by 5) to 
Trial 12 (omitting Trial 7) was calculated 
for each S, and the individual slope scores 
were subjected to an F test. The effect of 
standards was highly significant F = 17.75 
(p < .001). The actual performance means 
of the Standard and No Standard conditions 
are plotted by trial in Figure 1. It is evident 
that the effect of the standards was immediate 
and that the difference between the groups 
increased continually during the 12 trials. The 
mean total number of matches over all trials 
(excluding 7) was also significantly greater 
for the Standard group (t = 2.78; p< .01). 

There was no significant effect of Plans 
on the linear slopes (F = 1.44; p> .05) 
and no interaction effect (F = 3.28; p > .05). 
Since the Plan instructions were introduced 
only after Trial 6, the linear slopes were 
recalculated on the basis of performance 
from Trial 6 to 12 only (omitting Trial 7), but 
again the Plan effect was not significant (F 
= 1.20; p > .05). The responses to the post- 
experimental questionnaire indicated that 
most Ss were not able to make much headway 
at memorizing in the 2 hours allowed. Most 
could reproduce only three patterns or less 
and these were not exact nor were they re- 
called in order. (The Ss were considered to 
have “reproduced” a pattern if they could 
reproduce its “essential shape”; they did not 
have to name which pattern they were trying 
to reproduce. ) 

However, as it turned out several Ss in the 
Plan condition were not able to reproduce any 
patterns. On the other hand, several Ss in 
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the No Plan condition were able to reproduce 
several of the patterns, indicating that they 
had memorized some of the patterns even 
though they were not instructed to do so. 
When all Ss were reclassified according to 
the number of patterns they were able to 
reproduce (dividing the Ss at the median), 
the “High Memory” group made significantly 
more matches on Trials 8-12 than the “Low 
Memory” group (f= 2.12; p< .05). How- 
ever, this reclassification put 9 of the 14 
Standard Ss into the “High Memory” condi- 
tion; in addition, those 5 Ss in the “High 
Memory” condition who were No Standard 
Ss had a lower mean score than that of the 9 
Standard Ss. This suggests that the memoriz- 
ing that was done may have been as much 
the result of a higher level of performance 
(which the Standard Ss achieved) as a cause 
of such performance. 

How do standards have their effect? Mace 
(1935) found that Ss with specific per- 
formance goals dropped below their best pre- 
vious performance (in terms of their trial 
scores) only 10% of the time whereas Ss 
without such goals (those told to “do their 
best”) did so 50% of the time. In this study 
the corresponding figures are 21% and 41%. 
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But the mean difference in number of re- 
versals is highly significant (t = 4.07; p< 
.001). This suggests that one effect of goals 
is to maintain performance between trials, 
that is, to prevent “lapses in effort.” 

However, the question still arises, just how 
is this level maintained? Mace, for instance, 
found that the between trial improvement of 
the Standard group was due entirely to greater 
output during the latter part of each trial. 
During the first 2 minutes of each 20-minute 
trial there was no difference at all between 
the Standard and No Standard groups (for 
all trials combined). Thus, Mace concluded 
that the effects of the standards were to pro- 
long effort during the work period rather 
than to intensify effort at all stages of the 
trial. 

The appropriate data from the present 
study are shown in Figure 2. These are the 
mean performance scores of the Standard and 
No Standard groups for each 2-minute seg- 
ment of the 10-minute trials summed over all 
trials (excluding Trial 7, and, of course, the 
2-minute practice trial). It is evident that in 
the present study the difference between the 
Standard and No Standard groups is substan- 
tial for all the 2-minute segments. The ¢ val- 
ues for the mean differences for each 2-min- 
ute segment total are, respectively, 2.44, 2.60, 
2.81, 2.71, and 3.01, all of which are signifi- 
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cant at p< .05 or better. The difference in 
the mean within trial (linear) slopes of the 
Standard and No Standard groups, however, 
is not significant (¢ = 1.42; p > .05) though 
it is in the right direction. However, there is 
a significant difference between the Standard- 
Plan group and the No Standard-Plan group 
(= 2.24; p< .05) in mean within trial 
slopes. Apparently, the major effect of the 
goals in the present case was to improve per- 
formance between trials at every stage of the 
trial, though there was some difference within 
trials as well. 

One of the reasons for the difference of the 
two studies might have been that Mace’s trials 
were 20 minutes long while they were only 10 
minutes long in the present study. The longer 
the trials, the more likely the Ss without spe- 
cific goals should be expected to lag near the 
end of the period (as fatigue increases). How- 
ever, it appears that standards can intensify 
effort at all points in the trial as well as to 
prolong effort (as in Mace’s study) during the 
work period. 


DISCUSSION 


The findings of Mace (1935) using an 
arithmetic computation task were clearly rep- 
licated with a psychomotor task, indicating 
that the principle that performance goals in- 
fluence level of performance has some gen- 
erality over tasks. A recent experiment by 
Church and Camp (1965) yielded a similar 
finding using a reaction time task (though 
the theoretical interpretation of these investi- 
gators’ results is at variance with our own). 
Although a previous study by Locke (1966) 
found a strong relationship between the level 
of the performance goal (level of intended 
achievement) and level of actual performance, 
the present findings (along with those of 
Mace) may have greater practical implica- 
tions as in the latter two cases, Ss with spe- 
cific goals performed better than those told to 
“do their best.” The latter is a typical in- 
struction in most industrial, military, and 
educational training situations, but the pres- 
ent results indicate that such instructions (or 
goals) may not result in the highest possible 
level of performance. 

A second finding of interest in the present 
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tudy was that performance goals were shown 
o influence the intensity of the effort per 
mit time, whereas previously (Mace, 1935) 
hey had been shown only to prolong effort 
‘etter over the work period. The fact that 
verformance goals can have both effects ar- 
ues for the general importance of such goals. 
_ Finally the present findings are of theo- 
etical interest in that they emphasize the ef- 
ects of cognitive (intentional) aspects of 
aotivation. Ryan? (1958) and Ryan and 
mith (1954), for instance, have argued that 
he “task” or “intention” be taken as the 
undamental unit in motivation and that in- 
entions are the direct cause of most human 
vehavior. The present study, in addition to 
lemonstrating how such notions can be put 
o a test, supports the validity of this ap- 
roach, 


2 Unpublished mimeos, 1964. Chapter I: Explaining 
ehavior; Chapter II: Explanatory concepts; Chap- 
er V: Experiments on intention, task, and set; 
‘hapter VI: Intentional learning; Chapter VII: Un- 
atentional learning. Cornell University, Department 
f{ Psychology. 
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PREDICTION OF LEADERSHIP IN A SIMULATED 
INDUSTRIATSTASK 


EDWIN I. MEGARGEE, PATRICIA BOGART, anp BETTY J. ANDERSON 
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Ss high and low in dominance were selected with the California Psychological 
Inventory (CPI) Dominance (Do) scale and confronted with a simulated 
industrial task which could be solved best by 1 person assuming a leader role 
and the other following his instructions. When the instructions emphasized 
the task, the High Do Ss did not assume the leader role significantly more 
often than the Low Do Ss. When leadership was emphasized, however, the 
High Do Ss assumed the leader role in 90% of the pairs. It is concluded that 


the CPI Do scale has predictive validity when leadership is made salient. 


The Dominance (Do) scale of the Cali- 
fornia Psychological Inventory (CPI) was 
designed, according to Gough (1960), to 
“assess factors of leadership ability, domi- 
nance, persistence and social initiative [p. 
12|.” It was constructed by contrasting the 
test responses of college and high school stu- 
dents nominated by their fellows as being high 
and low on these qualities (Gough, McClos- 
key, & Meehl, 1951). 

Subsequent validational work has estab- 
lished that the Do scale correlates signifi- 
cantly with ratings of a person’s dominance 
(Dicken, 1963; Gough, 1960), and that stu- 
dent leaders, as determined by principal’s 
nominations or by office holding, have sig- 
nificantly higher Do scores than do nonlead- 
ers (Gough, 1960; Johnson & Frandson, 1962; 
Liddle, 1958). 

The difficulty with concurrent validation 
studies such as these is that they do not dem- 
onstrate whether a test adds useful informa- 
tion which is not already known about the 
assessee (Meehl, 1959). The high school 
counselor who wishes to determine whether or 
not a boy is a student leader would do better 
to ask the principal than to administer a 
CPI. Of course, what the counselor usually 
wishes to know is not whether a person is 
already a leader but whether he has the po- 
tential to become one. While concurrent vali- 
dation studies suggest that the Do scale might 
be able to predict this, only predictive valid- 
ity studies can answer the question with any 
degree of certainty. 

The literature on the predictive validity of 
the Do scale is scanty. Smelser (1961) has 
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demonstrated that it is possible to predict the 
relative achievement of two-man groups in a 
cooperative task when high dominance and 
low dominance subjects (Ss) are assigned to 
leader or follower roles. However, this study 
focused on group achievement rather than the 
amount of leadership actually displayed by 
the Ss in the task. Altrocchi (1959) used the 
Do scale to select two people high in domi- 
nance and two low in dominance so that their 
interactions could be filmed to be presented to 
Ss as a stimulus in an experiment. He indi- 
cated that their leadership behavior was con- 
sistent with their Do scores, but offered no 
data on the subject. 

Therefore, an investigation was undertaken 
to determine whether or not the Do scale 
could predict which of the two college stu- 
dents would adopt the leader role when con- 
fronted with a simulated industrial task which 
could be solved best by mutual effort in which 
one person was dominant and the other sub- 
missive. 


Stupy I 
Procedure 


The items from the CPI Dominance, Communal- 
ity, and Good Impression scales were assembled into 
a 113-item test labeled the “Gough Inventory.” The 
test was administered to the male members of three 
introductory psychology classes at the University of 
Texas. From these protocols, 25 pairs of Ss were 
selected. Each pair had one man high in the Do 
scale and one low in the Do scale. The “High Do” 
partner had to have a T score no lower than 54 and 
at least 10 T score points higher than the “Low Do” 
partner; the Low Do partner, on the other hand, had 
to have a T score no higher than 54 and at least 10 
points lower than his partner. The T scores of the 


LEADERSHIP IN A SIMULATED TASK 


TABLE 1 


CHARACTERISTICS OF THE SUBJECT PAIRS 
IN THE Two STUDIES 











Study I Study II 

Mean Do score of High 

Do subjects 64.00 62.00 
Range of Do scores of : 

High Do subjects 54-74 54-80 
Mean Do score of Low 

Do subjects 43.12 40.85 
Range of Do scores of 

Low Do subjects 31-54 27-53 
Mean pair difference 

in Do scores 21.00 24.00 
Range of pair differ- 

ences in Do scores 12-27 20-31 
No. of subject pairs 25 20 


igh Do Ss ranged from 54 to 74 with a mean of 
+. Those of the Low Do Ss ranged from 31 to 54 
ith a mean of 43.12. The mean pair difference was 
1.00 with a range from 12 to 27 (see Table 1). 

The Ss were contacted to find a mutually con- 
enient time when they could participate in the 
cond phase of the study. At the agreed upon time 
ley were met at the laboratory by one of the in- 
estigators (P.B. or B.A.) and led to a small room 
1 which a large wooden box rested on the floor. 
his box, which was 30 inches wide, 30 inches high, 
nd 60 inches long, was open on one end and closed 
n the other. Into the closed end, 100 #-inch holes 
ad been drilled 2 inches apart in a 10 X 10 square 
attern. Each hole was filled with a slot-headed 
olt 1 inch long and + inch in diameter with the 
otted head and a washer inside the box, and a 
juare nut tightly screwed onto the bolt from 
utside the box. Because of the narrowness of the 
-inch bolt relative to the 3-inch hole, the only way 
ae nut on the outside could be unscrewed effi- 
iently was for one partner to crawl inside the box 
nd hold the bolt in place with a screwdriver, 
hile the other partner remained outside and un- 
crewed the nut with a wrench or his fingers. The 
ze of the box precluded one person manipulating 
oth the wrench and the screwdriver simultaneously. 
Five of the nuts were painted red, 20 were painted 
ellow, 25 were painted green, and 50 were un- 
ainted. The colors were randomly distributed 
round the grid. None of the bolts or washers were 
ainted. The task called for the team to remove the 
ve red-painted nuts in the fastest amount of time. 
ince only the man on the outside of the box was 
ware of the color of the nuts, it was necessary for 
im to direct the man inside the box as to where 
9 place the screwdriver. The man on the inside, 
aving no information as to color, had no choice 
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but to follow the directions of the man outside. The 
following directions were read to the Ss: 


The purpose of this experiment is to find out 
how your scores on the Gough Inventory, which 
you have both taken, relate to your ability to 
work together on a mechanical task. 

This large box represents a machine. This end 
(gesturing) is open. The other end contains a 
number of nuts and bolts. Some of the nuts are 
unpainted; others are painted yellow, green, or 
red. Your job is to remove all of the red nuts, 
leaving the others in place. Do not remove the 
bolts, washers, or any of the yellow, green, or un- 
painted nuts. You may use the wrench and screw- 
driver provided here if you wish. When you are 
finished, signify by calling out “FINISHED” and 
I will stop the clock. 

I will observe you through the one-way screen, 
and will keep track of the amount of time you 
take. The team that does the most accurate job 
in the least time will receive $5.00. 

You will have 1 minute in which you can 
look over the situation; then I shall rap on the 
mirror. This rapping will indicate that I have 
started timing your performance. If you are 
ready to start before the minute is up, indicate 
by calling “START.” 

Do you have any questions? 

All right, I will leave now and you can study 
the problem for a minute. 


The person who remained on the outside and 
directed the task was defined as the dominant per- 
son. It was predicted that he would have the high 
Do score. 


Results and Discussion 


A total of 25 pairs of Ss were observed in 
this problem-solving situation. The results 
when tabulated closely approximated what 
could be expected on the basis of chance. 
Fifty-six percent of the High Do Ss assumed 
the “dominant” role and 44% assumed the 
“submissive” role (see Table 2). 

There appeared to be three possible expla- 
nations of the results: (@) The CPI Do scale 


TABLE 2 


NuMBER OF HicH Do Supjects AssuMING LEADER 
AND FOLLOWER ROLES IN THE Two STUDIES 








Study I Study II 
Number assuming 
leader role 14 18 
Number assuming 
follower role 11 Dba 
p ns <.001 
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simply did not have predictive validity. (0) 
The Do scores of the pairs were not far 
enough apart to produce significant differ- 
ences. (c) The experimental situation was 
not conducive to the display of leadership on 
the part of the High Do Ss. The instructions 
were task oriented and while one role ap- 
peared much more dominant than the other 
to the investigators, it was possible that to 
the Ss it was unimportant who took which 
role as long as the team won the $5.00 prize. 

In order to decide between these alterna- 
tive explanations of the results, a second study 
using different instructions and more dis- 
crepant Do scores was undertaken. 


Stupy II 
Procedure 


As in the earlier study, the “Gough Inventory” 
was administered to male members of three intro- 
ductory psychology classes. High and Low Do Ss 
were formed into two-man teams. High Do Ss had 
T scores of 54 or greater; Low Do Ss had T scores 
of 53 or less. The mean 7 score of the High Do 
group was 62.00, while the mean for the Low Do 
group was 40.85. The differences between partners 
in T scores on the Do scale ranged from 20 to 31 
with a mean of 24.00. (See Table 1.) 

The following instructions designed to focus at- 
tention on the roles rather than the tasks were read 
to each pair of Ss, with the italicized words em- 
phasized., 


This is a study of the relation between the 
Gough Inventory and Leadership under stress. 
This box represents a machine and you are a 
team of men who are to repair it in the fastest 
possible time. The repair that must be made is to 
remove all of the yellow nuts, leaving the red, 
green, and unpainted ones in place. One person, 
who is the Leader is to remain outside and the 
other, who is the Follower will crawl inside the 
box. The Leader must locate the yellow nuts, call 
out their locations to the Follower, and remove 
them using this wrench. The Follower must obey 
the Leader’s commands, and, using this screw- 
driver, hold the bolts in place while the Leader 
removes the nuts. It is up to you to decide who 
will be the Leader and who will be the Follower. 

Any questions? OK, I shall start timing you 
now. 


It was predicted that the High Do Ss would take 
the “Leader” role more often than the Low Do Ss. 


1Since this experiment was in another room 
which did not have a one-way screen, the experi- 
menter remained in the room to observe. It is possi- 
ble that the presence of an attractive young female 
experimenter added to the incentive to behave in a 
dominant fashion. 


E. I. Mecarcer, P. Bocart, AND B. J. ANDERSON 


Results and Discussion 


The results of Study II were clearly in the 
predicted direction. For 18 of the 20 teams 
the partner with the high Do score adoptec 
the leader role. When this result was testec 
by the Binomial test (with an expectatior 
under the null hypothesis of P = Q = .50) 
it was found to be highly significant (p < 
.001). These results show that the CPI De 
scale is capable of predicting dominant be. 
havior. 

More interesting than this, however, is the 
disparity between the results of Studies I anc 
II. The basic task was the same in both cases 
but the results were vastly different. One rea- 
son for this could be the fact that in Study II 
the differences in Do scores between the mem- 
bers of the pairs were significantly higher 
than they were in Study I (F = 5.89; p< 
OS). It is unlikely that this difference alone 
is enough to account for the changed results, 
however. Analysis of the data from Study I 
shows no significant tendency for those teams 
of Ss with higher Do-score discrepancies to 
favor the hypothesis more than those with 
lower Do differences. In fact, the results for 
the four pairs of Ss in Study I whose Do- 
score differences equaled or exceeded the 
mean for the Ss of Study II exactly equaled 
the 50-50 division predicted by chance. 

It therefore seems likely that the altered 
instructions used in Study II which empha- 
sized leadership rather than the task were 
primarily responsible for the High Do Ss sud- 
denly displaying the assertiveness which had 
not been apparent in the first study. 

This suggests that dominance is a person- 
ality trait which manifests itself only under 
certain conditions in which leadership is sali- 
ent. This is not surprising since Hartshorne 
and May (1928) found essentially the same 
thing for the trait of honesty many years ago. 
The assessor is, therefore, faced with the task 
of predicting not only which people are 
“dominant” but also the conditions under 
which their dominance will be manifested, 


2TIn one of the two teams in which the person 
with the high Do score took the “Follower” role, the 
High Do subject studied the situation, turned to his 
partner and emphatically stated, “OK, I'll go in the 
box and be the Follower. You stay here and be the 
Leader!” 


LEADERSHIP IN A SIMULATED TASK 


The present investigators felt that dominant 
dehavior would surely be elicited by the con- 
jitions in Study I, but they were wrong. This 
reemphasizes the importance of job or cri- 
terion analysis as well as personality analysis 
{ educators or personnel counselors are going 
‘o predict behavior accurately. 

This disparity also suggests that the present 
studies may furnish a useful paradigm for 
the study of the conditions under which domi- 
aance will be manifested. The essential ele- 
nents of this are the identification of “domi- 
nant” people using the Do scale, followed by 
exposure to the mechanical task under vari- 
dus conditions. 

For instance, in the United States it is 
zenerally considered appropriate for a male 
to dominate or direct a female and for a 
white person to dominate a Negro. The re- 
verse situation is not consistent with the typi- 
cal social patterns and attitudes. This might 
lead to the prediction that if the High Do 
partner was also the culturally appropriate 
leader, then, under the leadership-salient in- 
structions of Study II, a similar pattern 
would be found with the High Do person as- 
suming the leader role. On the other hand, if 
a High Do female or Negro was paired with 
a Low Do male or white S, then it might be 
found that the Low Do person assumed the 
leader role more often. In fact, it might well 
be found that in this situation, the more 
salient leadership was made to the task, the 
more often the person with the lower Do score 
would take the dominant role, in direct oppo- 
sition to the results obtained in the present 
study. Reluctance on the part of either or 
both parties to assume the culturally atypical 
roles could be more salient and nullify the 
personality trait of dominance. 

The effects of various kinds of incentives 
and sets could be also studied by varying the 
instructions or rewards. Conflict could be 
introduced by altering the instructions so 
that both the role and the task were made 
salient. For instance, the Ss could be given 
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the instructions of Study II, along with the 
information that the winning team would win 
$5.00 of which $4.00 would be paid to the 
leader and $1.00 to the follower. This ap- 
proach could be used to study predictions 
from game theory. 

The present investigation, therefore, indi- 
cates that the CPI Do scale is capable of 
predicting leadership. However, the conditions 
under which leadership is to be exercised are 
as important as the personality trait of domi- 
nance in determining whether or not dominant 
behavior will be manifested. The CPI Do 
scale and the mechanical task used in the 
present studies could be fruitfully applied to 
the investigation of these conditions. 
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AN EFFECT OF NOISE ON THE DISTRIBUTION 
OF ATTENTION * 


MURIEL M. WOODHEAD 
Medical Research Council, Applied Psychology Research Unit, Cambridge, England 


A paced search of a visual display was made in auditory conditions containing 
bursts of noise at either 68 db or 105 db. Each selected visual item required 2 
types of response, crossing out and counting. The preferred activity in the 
quieter condition was counting. When the test instructions emphasized this 
aspect of the task, attention shifted further toward the preferred activity 
during loud noise. When the instructions emphasized searching, there were no 
significant differences between noise and quiet. It appears that although noise 
will not always induce a redistribution of the attention needed to respond 
equally often in 2 paced activities, when it does so, the preferred activity gains. 


Is it possible for noise to induce a shift in 
the attention paid to two concurrent activi- 
ties entailing an equal number of responses, 
so that one activity gains, one loses? 

Previously (Woodhead, 1964a) it appeared 
that differences in the attention needed to 
monitor two unequally occurring features of 
a visual display were widened following a 
loud noise. Observation of the less frequent 
feature was impaired, although this feature 
in fact had greater value for the task as 
a whole. There was no instruction that one 
feature must have priority, but it rapidly 
became obvious to anyone practicing the task 
that a succession of errors could result from 
neglect of an infrequent item, in contrast to 
a single error from a frequent item, and that 
the difference in the number of their appear- 
ances was not enough to warrant neglect 
of the infrequent item. The subjects’ (Ss’) 
choice of the opposite order of priorities ap- 
peared unreasonable, but they could have 
been influenced by the unequal frequency of 
the two features. 

In the present experiment, a visual display 
required two different responses for each item 
that S selected. Thus there were two 
activities with the same frequency of occur- 
rence, so frequency was not a test variable. 
However, in competing response situations of 
this kind, Spence (1956) has stated that the 
response having greater excitatory potential 
at the moment will be the one that will occur. 


1The author wishes to thank Donald Broadbent 
for his advice, and the Royal Navy for supplying 
the subjects. 


In order to reduce slightly the competition 
between the two types of response in the 
test, one was to be overt (crossing out), the 
other covert (counting). If the general level 
of drive is altered by the addition of noise, 
Spence’s S-R drive theory would predict a 
variation in the level of response competition 
and the size of the difference between them. 
There remained some uncertainty about the 
relative difficulty of the two activities. Noise 
may be either arousing or stressful, and where 
the former is the case, McGrath and Hatcher 
(1961) have suggested opposing effects on 
easy and difficult work. The results of a 
pilot group of four Ss showed little difference 
between error scores for the two types of 
response. 

Alternative test instructions were com- 
posed, with the intention of influencing Ss 
in their choice of priorities in order to ma- 
nipulate the direction of any induced shift. 
If effective in both directions, this would be 
strong evidence that an extraneous arousing 
stimulus can alter the balance of attention. In 
pursuit of this demonstration, four groups of 
Ss were given a visual task in which they 
counted and crossed off selected items. The 
first group (M,) was told that this was a 
memory experiment which also included oc- 
casional bursts of noise. The second group 
(M,) was told the experiment was concerned 
with memory and included occasional quiet 
sounds. The third group (S,) was informed 
the experimental aim was to see how well 
they could find wanted letters, and that they 
would hear bursts of noise from time to time. 
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Errect oF NoIs— ON ATTENTION 


The fourth group (S,) was also guided 
toward finding wanted letters, but told the 
accompaniment would be quiet sounds. The 
hope was that these directions would per- 
suade M groups to pay more attention to 
counting rather than crossing out, and § 
groups to give crossing out priority. 


METHOD 
Subjects 


Seventy-two young sailors served as Ss, randomly 
assigned to four groups of 18. 


Visual Stimuli and Task 


The display was tilted 25 degrees from the 
horizontal. Four thousand five hundred letters, 
randomized by computer, were printed in rows of 
10 on paper strip. They moved under a slot which 
revealed each row for 2 seconds. Average viewing 
time was 0.2 second per letter. The task was to 
cross off and count four letters, two at a time, 
until five of each member of a pair had been 
achieved, followed by five of each of the other pair, 
and to continue alternating the sets of two, over 
a period of 15 minutes. 


Auditory Stimuli 


The same sound, lasting 1 second, was used 
for all groups. It contained frequencies in the band 
30-6,000 cycles, the major portion of energy being 
in the lower half. Presentation was through loud- 
speakers. For N groups, sound-pressure level on the 
C scale was 105 db; for Q groups it was 68 db. 


Procedure 


Each man was tested individually, some time 
between the hours of 1:30 pm and 5:00 PM, in a 
sound-insulated room. The seated S was first given 
the appropriate instruction for his group. 

Group Mn was told, 


This is a memory experiment. I would like you to 
look at a list of letters, like this, and search for 
a number of them. The aim is to find out how 
well you can memorize the number of letters 
you need. The letters you are looking for are C, 
R, X, and J. Look for C and R first, and cross 
off each C and R as you come to them until 
you have crossed off 5 Cs and 5 Rs. Then leave 
C and R, and go on to X and J. Cross off 5 Xs 
and 5 Js. As soon as you have finished X and J, 
start again immediately with C and R. Go on 
like that, one pair at a time, alternately, until I 
tell you to stop, after 15 minutes. The letters 
rarely come together, for instance when you have 
5 of one, you might only have 2 of the other, 
but don’t start on the next pair until you have 
5 of each letter in a pair. Always read from 
left to right. From time to time there is some 
noise during this test. It’s in the form of occasional 
short bursts of noise, not very loud, but medium 
loud. Ignore them, don’t let them distract you, 
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even for a second, because the idea is not to let 
anything distract you from the test. I'll let you 
hear a burst now, before we start, so that 
you know what it is like. 


Group M, was instructed similarly, except that 
“quiet sound” was substituted for “noise,” with “very 
quiet” for the qualifying description. 

Group Sn was told, 


This is an experiment to see how well you can 
find particular letters. Here is a list of letters 
for you to search. Whenever you see one of the 
letters that are needed, cross it off. Search care- 
fully and try not to miss any. The letters to 
look for are C, R, X, and J. To make it less 
boring, deal with them in pairs, taking C and R 
as the first pair. After you’ve found and crossed 
off 5 Cs and 5 Rs, then start searching for Xs 
and Js. After finding 5 Xs and 5 Js, go back to 
C and R, and so on all through the list, alter- 
nately, until I tell you to stop, after 15 minutes. 
Always read from left to right. Do be extremely 
careful not to miss a letter. The way we mark 
this test is, by taking away points for every mis- 
take. Each time you miss seeing one of the letters 
you lose 10 points. Whenever you cross off the 
wrong number for a set of 5, you lose one point. 


Then followed the noise explanation given to Ma. 

Group Sq was instructed in the same way as Sn, 
but with the “quiet sound” substitution given to Mg. 

Instructions were followed by a single demonstra- 
tion of the auditory stimulus. The S practiced un- 
paced for 3 minutes during which the sound occurred 
twice. The practice run was then marked and shown 
to S, errors being pointed out and counted— 
straightforwardly for M groups, by the points-pen- 
alty system for S groups. The 2-minute paced 
practice followed, accompanied by one burst of 
sound. Immediately afterwards S performed the test, 
during which the sound was presented at minutes: 
4, 2, 33, 54, 64, 74, 94, 102, 123, 14. 


RESULTS 


No use was made of the points-penalty 
system. The analysis of results was based on 
error scores. 


Levels of Difficulty 


Errors in counting were classed as memo- 
rizing errors. Wanted letters which went un- 
noticed were classed as searching errors. 
Some Ss in all groups said they had difficulty 
counting the two letters together, that is, in 
memorizing. Their comments on searching 
came less often, and concerned eyestrain and 
lack of time, but no mention of any difficulty 
inherent in the task. However, the highest 
proportion of memory errors among . the 
total (memory plus search) for any group 
was 29%, the lowest for search errors thus 
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TABLE 1 


EFFECTS OF NOISE: COMPARISONS BETWEEN GROUPS 








Differences in the 
proportion of 








errors which are Combined 
memory errors errors 
Mann- Mann- 
Group Whitney U3 Whitney U 
M, with M, 88.0 2.34* 156 | 
M, with S, 63: See ont2ae 122 ee 
Sa with Sq 153.5 ns 145:5(eraes 
Sq with M, 134.0 ns 141.5 
pa O2s 
*ED < 002. 


being 71%. The number of possible corrects 
was the same for both activities. Therefore 
memorizing seems to have been easier than 
searching, despite personal impressions con- 
veyed by Ss to the experimenter. 


Noise Effects 


When the instructions emphasized search- 
ing there was little difference between the 
noise and quiet groups, either in total errors 
or the proportion of one class of error to the 
other. However, with instructions biased 
toward memorizing, a _ significant change 
took place in the distribution of class of re- 
sponse, although exactly the same total errors 
were obtained for both groups. The direction 
of the change was for improved memorizing 
at the expense of searching. The results are 
given by the first and third comparisons of 
Table 1. It appeared that noise was effective 
in condition M, ineffective in S. The change 
in distribution of errors is illustrated in 
Figure 1, from which it will be seen that 
the proportionate difference between noise 
and quiet for S Ss was as small as 1%. 
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Fic. 1. Proportionate distribution of classes of error 
to combined errors. 


Instructional Effects 


The instructions had no material effect on 
performance as long as only quiet sounds 
were present. But with the addition of noise 
a highly significant difference appeared be- 
tween groups M, and S,, mainly due to the 
behavior of group Mn, as indicated in Table 1. 
The direction was toward less relative 
amount of errors in the emphasized activity, 
more in the secondary activity. The raw data 
shown in Table 2 suggested that memorizing 
was the effective variable, even in S condi- 
tions. This was investigated by two one-way 
analyses of variance (Kruskal & Wallis, 
1952). Over all groups, memory errors differed 
(H = 14.59, df =3, p< .01). This was a 
progression from least likelihood in memoriz- 
ing with noise, through memorizing in quiet, 
searching in quiet, to searching in noise, the 
latter having the greatest likelihood of mem- 
ory errors. Searching errors did not differ 
through the four conditions, 


TABLE 2 
DISPERSIONS OF ERROR SCORES IN Eacu Group or 18 SusByEcts 











Memory task 


First 
quartile Median quartile 


Search task Combined 


Third First Third 


quartile Median quartile 





First Third 

Group quartile Median quartile 
M, 8 9 12 46 
M, 9 12 20 42 
Sn i 19.5 29 49 
Sq 6 21.5 28 ab) 


58 71 51 68 82 
50.5 76 49 67 93 
56 71 67 76 98 
47.5 68 49 80 96 
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An Additional Procedure 


Two small further groups were tested with 
a different version of search-emphasized in- 
_ structions. They did not differ from S, and 
Sq, but differed significantly from M, and 
M,, confirming that there was no unfortunate 
_ wording influencing the larger S groups. 


DISCUSSION 


The results of M Ss show clearly that it is 
possible for noise to cause a shift of attention 
between two concurrent activities. Equally, 
there may be no shift, as the performance 
of S Ss demonstrates. What makes the dis- 
tinction could be important in situations 
where an operator does more than one thing 
at a time in the presence of irrelevant noise. 

In seeking reasons for the different sensi- 
tivities of the tasks, the critical factor cannot 
be that hypothesized after the previous study 
(Woodhead, 1964a), the unequal occurrence 
of two visual features. There were now, how- 
ever, differences in level of task difficulty. 
The performance of the pilot group of Ss 
(members of staff) was atypical in that they 
searched almost as efficiently as they memo- 
rized. The 72 test Ss found searching more 
than twice as hard if errors are the criterion. 

An explanation along the lines of level of 
difficulty would agree with the hypothesis of 
McGrath and Hatcher (1961), who assumed 
an interaction between task difficulty and the 
effects of arousing stimulation: “On an easy 
task arousing conditions improve performance 
and on a difficult task arousing conditions 
have a detrimental effect on performance.” In 
the present study, S was performing easy 
and difficult tasks together, believing that 
one had prior claim on his attention. If one 
supposes that the noise would arouse him, his 
additional concentration would go to the 
favored, easy activity, to the detriment of 
the difficult one. Unfortunately level of dif- 
ficulty cannot be the whole answer, since it 
does not explain the interaction between noise 
and instructions when nothing changes in the 
display, the timing, or the environment. 

Samuel (1964) found that when his visual 
addition task was presented in two spatial 
arrangements, only the more difficult version 
improved in noise. This is quite at variance 
with the prediction of McGrath and Hatcher. 
One of Samuel’s suggestions was that the 
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continuous noise masked S’s voice so that 
he may not have checked his response, with 
the result that performance was more fluid 
than in quiet. But lack of feedback could not 
apply to the present test in which brief bursts 
of noise were separated by intervals of at 
least 1 minute. It seems most probable that 
difficulty and feedback are not the reasons 
for activities being unaffected or affected by 
noise, nor for the direction of changes which 
do take place. 

A more uniform result was provided by 
an experiment using the same bursts of noise 
and a mental arithmetic task (Woodhead, 
1964b). Each arithmetic problem was in 
two parts: paced but slow perceiving and 
memorizing, followed by unpaced calculation. 
Adding noise to the first part led to errors in 
the result of the calculation. In view of the 
present experiment, it would, however, be un- 
wise to assume that noise while perceiving 
and memorizing visual material will always 
cause a decrement. The direction of effect is 
unreliable. 

Obviously variations of this kind indicate 
the need for further inquiry, but they do not 
vitiate the original question. It is possible, 
though not always possible, for noise to induce 
a shift in the attention needed to respond 
equally often in two paced activities. It seems 
that although noise does not always induce a 
redistribution of the attention given to a 
visual display, when it does so, attention is 
likely to shift further toward the preferred 
activity. 
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ORGANIZATIONAL CONDITIONS AND BEHAVIOR IN 234 
INDUSTRIAL MANUFACTURING ORGANIZATIONS * 


GEORGE H. DUNTEMAN 


Regional Rehabilitation Research Institute, College of Health Related Professions, 
University of Florida 


This study was to explore some of the interrelationships among 84 variables 
pertaining to company and formal organization characteristics, management 
attributes, incentive conditions, worker characteristics, personnel performance, 
and organizational functions in a sample of 234 manufacturing firms. The data 
were obtained by an 84-item multiple-choice questionnaire sent to a repre- 
sentative sample of 2,938 manufacturing firms located throughout the United 
States. The correlations among the 84 variables were factor analyzed and the 
factors rotated to a simple structure. 14 dimensions of organizational attributes 
and behavior were isolated and interpreted. Among the significant findings 
was the relatively high independence of organizational attributes and behavior 
as evidenced by their being defined by separate sets of factors. 


In recent years much theory and conse- 
quent research on organizational behavior has 
been evident. However, March and Simon 
(1958) point out that the writings about 
organizations are scattered and diverse, and 
that the literature discloses large discrepan- 
cies between hypotheses and evidence. The 
literature contains many assertions, often with 
little data to back them up. Research on this 
topic has traditionally been carried out 
through laboratory investigations, field experi- 
ments, and the intraorganizational approach. 
Although little laboratory research has been 
directed toward the investigation of industrial 
organizations per se, much laboratory re- 
search which has been conducted on small 
groups may be considered to have relevance 
to the process and perhaps particularly the 
unprogramed activities of groups that occur 
in formal organizations (Bass, 1960; Cart- 
wright & Zander, 1960). 

Field experiments are characterized by the 
actual manipulation of variables rather than 
just by survey and correlational analysis. 
Coch and French (1948) have carried out 
field experiments involving the effect of vari- 
ous types of supervision on worker perform- 
ance. In general, field experiments involving 


1This article is based on the author’s doctoral 
dissertation while at Louisiana State University, 
prepared under the direction of George J. Palmer, 
Jr. This study was partially supported by the Office 
of Naval Research, Contract Nonr 1575(05), Project 
NR 170-478. 


organizations, especially those involving the 
manipulation of major variables, are quite 
scarce for the obvious reason of interference 
in organizational procedure. 

The intraorganizational approach has been 
utilized by Shartle (1956), McGregor (1960), 
Rubenstein (1960), Argyris (1960), and 
others. These researchers attempt to support 
their hypotheses by survey and correlational 
analysis of personnel variables within organi- 
zations, analysis of unit (e.g., departmental) 
operations, participant observation, and 
interview findings. 

Most of the current literature of research 
on real-life organizations has been provided 
by the intraorganizational approach. Such re- 
search typically involves the investigation of 
one or a small number of firms. The possibil- 
ity of generalizing from such studies has been 
necessarily curtailed because there has been 
no sampling of organizations, of time periods, 
or control over the relevant organizational 
variables which would explain the circum- 
stances under which relationships do or do 
not occur. 

An approach which is considered more 
appropriate for the objectives of organiza- 
tional study and which overcomes many of 
the mentioned limitations involves the use of 
sampling surveys whereby data can be gath- 
ered from a large number of organizations, 
Palmer (1961) examined 35 organizational 
survey variables pertaining to organizational 
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conditions and personnel performance for a 
sample of 188 manufacturing firms in a 
southern metropolitan region. An analysis 
resulted in eight orthogonal rotated. factors. 
The factors were identified as follows: Retire- 
ment Welfare, Cooperation with Survey, Size 
of Work Force, Thrift Benefits, Cost of Sick- 
ness versus Use of Machinery, Job Aversion 
(e.g., lates, turnover, grievances, and com- 
plaints), Insurance Benefits, and finally 
Product Theft versus Discounts on Product. 

Examination of the rotated factors indi- 
cated no support for Revans’ (1958) notion 
that less favorable performance is associated 
with lJarger firms. Palmer’s (1961) analysis 
also disclosed that productivity, job aversion, 
and theft were mutually independent behay- 
iors and further that job-aversion behavior 
was unrelated to any positive incentive condi- 
tions investigated. Each of these independent 
behaviors was found to be related to different 
organizational conditions. 

The purposes of the present study were 
as follows: 

1. The isolation of various independent 
dimensions of organizational behavior and 
attributes (the identification of major sources 
of variance among companies) ; 

2. Examining the relationships between 
organizational effectiveness, personnel per- 
formance, and the extracted factors; de- 
termining the characteristics of the factors 
related to personnel performance and organiza- 
tional effectiveness. 


METHOD 


The present investigation was concerned with a 
factor analysis of a limited number of variables. 
The areas investigated were as follows: company and 
formal organization characteristics (e.g., size of firm), 
management attributes (eg., average management 
tenure), incentive conditions (e.g., presence of pen- 
sion programs), worker characteristics (e.g., per- 
centage of workers who are high school graduates), 
personnel performance (e.g., turnover), and organi- 
zational effectiveness (e.g., number of jobs eliminated 
by laborsaving devices). 

In general, the criteria for selecting these vari- 
ables were as follows: (a) relevance or importance 
as indicated by current theory and research, (b) 
facilitating factor identification by including Palmer’s 
(1961) factor markers, (c) accessibility of the in- 
formation, and (d) objectivity of recording the data. 
It is important to realize that not all the variables 
considered in this study were selected to define 
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factors and provide factor saturations, but that 
some of them were included as controls so that 
specified influences such as size and age of firm 
could be recognized and partialed out. 


Questionnaire 


A multiple-choice questionnaire containing 84 
items pertaining to the six areas of organizational 
attributes and behavior discussed previously was 
developed and pretested on a sample of local 
manufacturers. 


Sample Survey 


A representative list of 2,938 manufacturing firms 
residing throughout the continental United States 
was developed. The number of firms selected from 
each state was based on the proportion of manu- 
facturing in the respective states according to the 
1961 United States Statistical Abstracts. However, 
within each state the firms were drawn at random 
from the appropriate state directory of manufactur- 
ers. The questionnaires were addressed to the person- 
nel managers of the various organizations. It is pre- 
sumed that either the personnel manager or someone 
“qualified” in the personnel department answered the 
questionnaire. It could be that some of the answers 
are based on estimates. However, the options for 
most of the items are written so that an exact 
answer is not necessary, for example, 10-20%. With 
each questionnaire was sent an explanatory covering 
letter, a postage-paid return envelope, and two IBM 
mark-sense cards for recording answers. Each com- 
pany was offered a free summary report if they 
desired one. The participating firms were assured 
that all information would be kept confidential. 


Data Analysis 


Pearson linear product-moment correlations were 
utilized to compute the intercorrelations between 
the 84 variables. Where some respondents reported 
no information for some questions, correlations 
among the variables were based upon the number 
of cases common to each pair of variables. The NV 
for the correlations ranged from 66 to 232. However, 
the majority of the correlations were based on an 
N of 180 or more and in only four instances were 
the correlations based on an WN of less than 100. 

The 3,486 correlations were subjected to a prin- 
cipal components analysis (unities were placed in 
the principal diagonal) and rotated to nearly or- 
thogonal, simple structure by the varimax method 
(Kaiser, 1958).2 The decision to stop factoring was 


2A list of the 84 items comprising the question- 
naire and the rotated factor loadings of these items 
has been deposited with the American Documentation 
Institute. Order Document No. 8896 from ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington, D. C. 
20540. Remit in advance $1.75 for microfilm or 
$2.50 for photocopies and make checks payable 
to: Chief, Photoduplication Service, Library of 
Congress. 
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based upon the diminishing contributions to the total 
variance of the successively extracted factors. 


RESULTS 


Returns 


Eight percent (234) of the 2,938 question- 
naires were returned. The low percentage of 
returns is attributed in part to the length of 
the questionnaire, the reluctance to reveal 
certain information, and the low pulling 
power of a general appeal by letter. Because 
of the small number of returns and the conse- 
quent possibility of bias, the results of this 
study must be interpreted with caution. 
However, the bias, while probably affecting 
the item means of the questionnaire, might 
not have appreciably affected the pattern of 
intercorrelations. 

The first 14 unrotated factors extracted 
accounted for 54% of the total variance. In 
general, the examination of the 84 lambdas 
suggested much specificity in the original 
intercorrelation matrix. After these 14 factors 
were rotated, it was found that Factor I 
accounted for 26.7% of the common vari- 
ance while the remaining 13 factors contrib- 
uted from 7.9 to 3.8% of the common vari- 
ance. The rotated factors were nearly orthog- 
onal, the correlations between factors ranging 
from .00 to .35. However, only 2 of the 91 
intercorrelations were above .20. The major- 
ity of the items were defined in terms of 
either absolute numbers or _ percentages. 
Therefore, in most instances, only measures 
based on percentages can be meaningfully re- 
lated to size and other variables associated 
with size. It was expected, for example, that 
larger firms would have a greater number 
of absentees. 

Factor I: Size of Organization. The vari- 
ables that load significantly on Factor I 
seem to be indicative of the size of the 
organization. Larger firms are able to offer 
more in the way of recreation, retirement, 
and insurance programs than smaller firms. 
Furthermore, larger firms offer more oppor- 
tunities for promotions and pay increases. 

The volume of personnel behaviors such as 
absenteeism, accidents, discharges, behavior 
problems, and theft is highly related to size 
as would be expected on the basis of item 
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definitions. However, there is no evidence to 
indicate that these performance measures are 
disproportionately related to size, lending no 
support to Revans’ (1958) conclusion that 
generally less favorable performance is as- 
sociated with size. Job-aversion behaviors 
such as lates and turnover were completely 
independent of size. 

Larger firms experience more strikes, etc., 
and a higher percentage of worker grievances 
than do smaller firms. A partial explanation 
for this relationship might be found in the 
increased union activity in the larger firms. 

The structure of this factor also suggests 
that larger firms tend to be more attractive 
to job seekers as indicated by the percentage 
increase in applicants loading on this factor. 
This observation lends support to March and 
Simon (1958) who contend that larger firms 
are more visible and therefore more likely to 
attract applicants. 

Factor II: Economic Growth. This factor 
is characterized by growth primarily in the 
economic sphere of industrial functioning. It 
is important to note that organizational 
growth is independent of size, management 
education and experience, employee skills and 
performance, and all the other organizational 
variables considered. It would seem highly 
probable, because of the many internal 
conditions examined here, that conditions 
external to the organization play a substantial 
role in the organizational growth of a business 
enterprise. 

Factor III: Tardiness versus Family Re- 
sponsibility, The positive pole of Factor III 
is substantially related to the incidence of 
morning and afternoon lates and the negative 
pole moderately related to the percentage of 
employees who are married with two or more 
children. This observation suggests that work- 
ers with family responsibilities are less likely 
to exhibit irresponsibility on the job at least 
in respect to reporting late. 

Factor 1V: Pay-Skill Level. Various incen- 
tive conditions (primarily pay level) and 
worker characteristics (skill, sex, and educa- 
tion) covary together, contributing essentially 
all of the factorial variance to this factor. 
This factor exemplifies the common observa- 
tion that jobs requiring education and skill 
are more highly rewarded than those that do 
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not. An implication here is that money buys 
skills, but not necessarily better performance. 

Factor V: Personnel Tenure. Both manage- 
ment and worker tenure are highly interde- 
pendent. This supports the notion that, in 
general, firms have attractions which trans- 
cend all levels of employment. Besides the 
expected inverse relationship with turnover, 
tenure is conspicuously independent of all 
other considerations. Of particular interest is 
the fact that tenure is independent of incen- 
tive conditions, particularly those involving 
retirement benefits which would be expected 
to be especially effective in prolonging tenure. 

Factor VI: Ownership and Concern for 
Organizational Interests. The negative pole of 
this factor reflects dissatisfaction with pay, 
lack of concern for plant and equipment, and 
customer dissatisfaction, while the positive 
pole reflects personal involvement of both 
management and employees as evidenced by 
stock ownership. The significant result here 
is that stock ownership is related to those 
performance measures that involve direct cost 
reduction or maintaining the goodwill of the 
clientele. A possible conclusion to be drawn 
from this factor is that employees with owner- 
ship roles cooperate toward some important 
goals by essentially protecting their own 
investments. 

Factor VII: Work-Force Reduction and 
Job Mechanization. This factor suggests that 
firms manufacturing for inventory are more 
likely to reduce their work force and number 
of jobs as a consequence of automation or 
mechanization. 

Factor VIII: Technical Personnel and Con- 
trols versus Protection against Human Lia- 
bilities. One pole of this factor emphasizes 
the use of technical specialists and procedures 
while the other pole indicates the presence of 
sroup insurance programs, The interpretation 
of this factor is not clear-cut. One interpre- 
tation might be that organizations attempt 
fo insure themselves against contingencies 
that cannot be coped with by technological 
specialists, 

Factor IX: Minority-Group Composition. 
his factor primarily reflects the presence of 
ninority groups in the working force. 

Factor X: Improvement of Working Con- 
litions. Improvement of working conditions 
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loads substantially on this factor while sub- 
standard production and the number of 
people involved in research and development 
load slightly in the same direction. This fac- 
tor pattern is not clear-cut, but it is possible 
that the existence of substandard production 
has in part prompted research and develop- 
ment and the improvement of working 
conditions, 

Factor XI: Retail Sales Personnel and 
Authority-Conflict Behaviors. The incidence 
of theft and superior-subordinate conflict is 
more frequent in those firms characterized by 
larger retail sales forces and local product 
distribution. It is quite possible that a 
large number of superior-subordinate conflicts 
revolve around theft situations. Palmer’s 
(1961) finding that amount of product dis- 
count was inversely related to the incidence 
of theft was not disclosed in the present 
analysis, but his finding that theft was 
an independent dimension of behavior was 
confirmed. 

Factor XII: Community and Employee 
Support versus Work-Output Restriction. The 
general picture is that of an organization 
which offers benefits to employees through 
recreation and savings-investment programs, 
and to the local community through mone- 
tary contributions and management partici- 
pation in charitable and civic organizations. 
Organizations fulfilling this description ex- 
perience fewer work stoppages and less sub- 
standard production relative to other firms. 
Such organizations are also more likely to 
experience an increased percentage of appli- 
cants. This factor is interesting because it 
is one of the few factors to suggest a di- 
rect relation between personnel benefits and 
productivity, although the magnitude of the 
relationship is low. 

Factor XIII: Employee Selectivity. The 
implication here is that firms paying better 
wages enjoy a more favorable selection ratio. 

Factor XIV: Allocations to Labor versus 
Product Development. Firms that are com- 
posed of hourly workers are more likely to 
grant pay increases and less likely to 
introduce new products in their product line. 
Although the meaning of this factor is not 
clear, one suggestion is that investment is 
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made in worker wages rather than in the 
development of new products. 


DISCUSSION 


Neither size nor age was appreciably re- 
lated to measures of personnel performance 
and organizational functioning. Personnel 
performance and organizational functioning 
varied with factors that could be controlled 
(e.g., recreation and savings-investment pro- 
grams) rather than with the enduring and 
unalterable conditions of an organization such 
as size and age. 

Management tenure and experience were 
completely independent of other management 
characteristics such as age, education, pay 
level, and incidence of promotions. Most im- 
portant, management tenure and experience 
were conspicuously independent of other vari- 
ables, especially those involving personnel 
performance and organizational functioning. 

One would not suspect that incentive 
conditions, benefits, and programs are so far 
removed from being unidimensional in nature. 
That they are not unidimensional is attested 
by their breaking up and scattering over nine 
different factors. 

Pay level tends to be related to worker 
characteristics pertaining to skill and educa- 
tion, and is independent of personnel perform- 
ance and organizational functioning. Quite 
the opposite holds true for recreation and 
savings-investment benefits which are essen- 
tially independent of worker characteristics, 
but related to a number of personnel perform- 
ance (substandard production, incidence of 
strikes, work stoppages, etc.) and organiza- 
tional effectiveness (monetary civic support, 
management office holders, and increase in 
applicants) variables in the expected manner. 
Consequently, the importance of fringe bene- 
fits in the maintenance of performance is 
given some support. 

Following the pattern of management at- 
tributes, employee tenure was independent of 
other worker characteristics, especially those 
involving skill and education. In both cases, 
tenure was unrelated to any identifiable 
organization attributes or conditions. 

The present finding that productivity, job 
aversion, and theft are mutually independent 
behaviors is in agreement with Palmer 
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(1961). However, Palmer (1961) found job- 
aversion behaviors to be rather unitary while 
in the present study aversive behaviors split 
into a number of independent components. 
For example, strikes, lates, and turnover were 
found to be independent of each other. These 
are distinct ways of avoiding the job and 
each seems to be related to unique conditions 
within the organization. Strikes are related to 
the presence of unions; turnover is associated 
with employee age and tenure; and tardiness 
covaries with family responsibility. 

Productivity loads slightly on three dif- 
ferent factors and has no simple relationships 
with other variables. Theft, on the other 
hand, seems simply to be related to the 
presence of a retail sales force where oppor- 
tunities for theft would seem to best present 
themselves. 

Like personnel performance, organizational 
functioning was multidimensional in nature. 
The interdependence of personnel perform- 
ance and organizational functioning is notice- 
ably absent in many respects. 

Probably the most significant aspect of this 
investigation is its demonstration of the 
complex relationships that can be expected to 
exist between various organizational attributes 
and behaviors. Typically, investigations have 
been concerned with a relatively few number 
of variables such that complex relationships 
were automatically ruled out. As more vari- 
ables are taken into consideration, relation- 
ships among the original variables become 
altered and take on new significance. This is 
partially exemplified by comparing the results 
of the present investigation with those of 
Palmer’s (1961). 
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EFFECTS OF PRACTICE ON APTITUDE SCORES 


ROBERT C. DROEGE 


United States Employment Service, Washington, D. C. 


This study investigated long-range effects of practice on the General Aptitude 
Test Battery (GATB). The design involved testing a sample of employees of 
State Employment Security agencies with the GATB and dividing this sample 
into 3 subsamples, subsequently retested with an alternate form after 1 yr 
(N = 302), 2 yr (N=288) and 3 yr (N=306). Major findings were (a) 
significant practice effects for all aptitudes for each subsample, (b) evidence 
that initial level is a factor in the size of increase for 2 aptitudes, and (c) no 
deterioration in size of relationship between initial testing and retesting for 
any aptitude over the time span of the study. 


It is well known that the experience of 
taking aptitude tests often leads to an in- 
crease in test scores upon retesting. However, 
there has been little systematic study of the 
relationship between practice effects and (a) 
length of interval between initial testing and 
retesting, (b) type of aptitude measured, 
and (c) other variables, such as initial level 
of performance on the specific test and tests 
of other aptitudes. The purpose of this article 
is to describe research by the United States 
Employment Service to investigate these rela- 
tionships. Specifically, the study was con- 
ducted to determine the effects of a previous 
administration of the General Aptitude Test 
Battery (GATB) upon scores based on a 
subsequent administration of an alternate 
form of the GATB when the interval between 
initial testing and retesting is 1, 2, and 3 
years, respectively. 


PROCEDURE 


Eighteen State Employment Services participated 
in the study by collecting data in accordance with 
an experimental design developed by the United 
States Employment Service. The sample consisted 
of individuals between 25 and 34 years of age at 
the time of initial testing. The age range 25-34 was 
chosen because it represents the interval during 
which the effects of maturation and aging upon 
GATB aptitude scores appears to be minimal 
(Droege, 1963). Most of the individuals in the 
sample were employees in local or state employment 
security offices. No person who had taken the GATB 
or was familiar with it was included in the sample. 
At each testing location those initially tested were 
divided randomly into Subsamples A, B, and C at 
the time of testing. The three subsamples were 
tested initially with the GATB, B-1002B during the 
same 1-month period before June 30, 1959, and then 
retested with an alternate form of the GATB 


(B-1002A) after intervals of 1, 2, and 3 years, 
respectively. Of the 1,309 initially tested 896 were 
available for retest and were included in the final 
sample. Table 1 shows data on age, education, and 
sex for each of the three subsamples. Note that they 
are quite comparable with regard to these basic 
characteristics. 


RESULTS 


Table 2 shows the means and standard 
deviations of GATB aptitude scores for the 
three subsamples. Table 3 shows the increases 
in aptitude mean scores between initial test- 
ing and retesting and the aptitude reliability 
(test-retest) coefficients for the three sub- 
samples. Tests of differences between cor- 
related means show that all of the increases 
in mean scores shown in Table 3 are signifi- 
cant at the .01 level. Thus, effects of practice 
appear to be operating for all aptitudes even 
when the interval between initial testing and 
retesting is as long as 3 years. 

Is there a deterioration in reliability or 
stability of measurement between 1 and 3 
years after initial testing? For each aptitude 
an F test was made of homogeneity of within- 
class regression. Significant differences among 
the three regression coefficients were not 
found for any of the nine aptitudes, indi- 
cating no evidence for heterogeneity of re- 
gression for the three samples. Another point 
of interest is that the reliability coefficients 
shown in Table 3 compare quite favorably 
with reliability coefficients for similar samples 
for which the interval between  testings 
was considerably less than 1 year (U. S. 
Department of Labor, 1962). 

A significant increase in mean scores was 
obtained for each aptitude for each subsample. 
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EFFECTS OF PRACTICE 


TABLE 1 


AGE, EDUCATION, AND SEX CHARACTERISTICS 
OF THE SUBSAMPLES 








Age Education Percent 
Subsample M «a M oa Male 
| A (WV = 302) 29.8 3.0 135559129 39 
B (WN = 288) ZO SO 13:3 5 2,0 At 
C (WV = 306) 30.4 2.8 13.0 1.8 40 





But is the size of the increase a function of 
the interval between testings? That is, do the 
effects of practice decrease with time, as 
might be expected, or do they remain con- 
stant between the 1- and 3-year intervals 
covered by this study? This is a question for 
an analysis of covariance. Table 4 shows the 
results of the covariance analysis, the F ratio 
providing the test of the hypothesis that there 
is no difference in retest means after the data 
have been adjusted for the initial level. 
Significant differences were obtained for 
Verbal Aptitude (V), Clerical Perception 
(Q), Motor Coordination (K), Finger Dex- 
terity (F), and Manual Dexterity (M). 
Thus, there is evidence that the length of 
interval between initial testing and retesting 
is a factor in the amount of increase in mean 
score for these aptitudes. For the remaining 
aptitudes, there is no evidence that, within 
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the time span for this study, length of in- 
terval between testings is a factor influencing 
practice effects. 

Another question of interest is concerned 
with the possibility of a relationship between 
level of ability and size of practice effect. 
Specifically, is the amount of score increase 
dependent on initial aptitude level? The cor- 
relation between gross gain (difference be- 
tween final and initial scores) and _ initial 
score would not provide the relevant informa- 
tion on this point. These correlations would 
tend to be negative, not because of an 
intrinsic negative relationship, but because of 
a statistical artifact (Lord, 1958; Mayo & 
DuBois, 1963). A comparison of the test vari- 
ence of initial and retest scores does provide 
relevant information. If the variance tends to 
be lower for the retest scores than for the 
initial scores, there is an indication that indi- 
viduals with low ability may gain relatively 
more than those with high ability. On the 
other hand, higher variability in the retest 
scores than in the initial scores would indicate 
that the opposite may be the case. Results of 
significance tests applied to difference in cor- 
related variances show that significant dif- 
ferences for all three subsamples were found 
for Numerical and Spatial Aptitudes. In the 
case of Numerical Aptitude, the retest vari- 
ance is consistently Jower than the initial 
test variance. The opposite is true of Spatial 


TABLE 2 


MEANS AND STANDARD Deviations oF GATB AptitupDEs AT INITIAL TESTING AND RETESTING 
FOR THE SUBSAMPLES A (V = 302), B (V = 288), ann C (WV = 306) 



































Subsample A Subsample B Subsample C 
1-Year Interval 2-Year Interval 3-Year Interval 
Aptitude < ; ' q J . 

1st Testing | 2nd Testing | 1st Testing | 2nd Testing | 1st Testing | 2nd Testing 

M o M o M o M o M o M o 
G—Intelligence 110.2 | 17.7 | 114.0} 16.1 | 109.2 | 16.9 | 112.8] 16.8 | 106.4} 18.0 | 110.7 | 17.5 
V—Verbal Aptitude 110.8 | 17.6 | 114.1 | 16.6 | 109.2 | 16.9 | 112.2} 16.8 | 106.8] 15.9 | 111.6] 16.8 
N—Numerical Aptitude} 109.4 | 16.2 | 112.8 | 14.9 | 110.0} 16.3 | 113.3} 14.7 | 106.2 | 17.3 | 109.3 | 15.5 
S—Spatial Aptitude 104.8 | 19.0 | 109.1 | 21.3 | 103.0 | 19.4 | 107.0] 22.1 | 102.2] 19.6 | 106.0 | 21.0 
P—Form Perception 109.1 | 18.8 | 111.5 | 17.5 | 109.3] 17.5 | 112.2 | 16.8 | 107.1 | 16.4 | 110.4] 16.0 
Q—Clerical Perception | 119.1 | 16.0 | 125.1] 16.2 | 119.1} 16.7 | 124.0] 17.3 | 116.4] 15.4 | 120.0] 15.2 
K—Motor Coordination] 116.8 | 17.8 | 124.1} 17.5 | 115.7] 16.8 | 122.0} 18.0 | 114.9] 18.4 | 119.2] 18.3 
I—Finger Dexterity 102.4 | 21.2 | 110.9 | 21.3 | 100.3 | 21.6 | 109.7 | 22.5 | 101.6 | 20.2 | 107.0] 21.0 
M—Manual Dexterity | 104.3 | 21.8 | 110.5 | 22.8 | 102.7 | 21.2 | 113.3 | 23.6 | 102.9] 22.3 | 109.7 | 22.1 
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TABLE 3 


INCREASES IN GATB AprirTuDE MEAN SCORES AND RELIABILITY COEFFICIENTS FOR SUBSAMPLES A 
(1-YEAR INTERVAL), B (2-YEAR INTERVAL), AND C (3-YEAR INTERVAL) 




















Increase in means Reliability coefficients 
Aptitude 
1 year 2 years 3 years 1 year 2 years 3 years 
(NV = 302) | (WW = 288) | (VW = 306) | (V = 302) | (WV = 288) | W = 306) 
G—Intelligence 3.8 3.6 4.4 90 85 90 
V—Verbal Aptitude SFO 3.0 4.8 .85 87 85 
N—Numerical Aptitude 3.4 Oe oad .83 87 88 
S—Spatial Aptitude 4.2 4.1 3.8 83 19 84 
P—Form Perception 2.4 3.0 OES 18 74 16 
Q—Clerical Perception 6.0 4.9 3.6 74 sii 74 
K—Motor Coordination 7.3 6.4 4.2 85 85 .84 
I'—Finger Dexterity 8.5 9.4 Deo) 16 .69 a 
M—Manual Dexterity 6.2 10.6 6.9 .76 fe. 19 

















Aptitude, where the retest variance is con- 
sistently Higher than the initial test variance. 
Thus, there is evidence that the relationship 
between initial aptitude level and size of 
practice effect is negative for Numerical Apti- 
tude and positive for Spatial Aptitude. There 
is little or no evidence of either a positive 
or a negative relationship between initial level 
and gain for the other aptitudes. 

The gain in score on a particular aptitude 
may be related to variables such as age, edu- 
cation, or scores on other aptitudes. For 
example, in a study reported by Abbey 
(1962) it was found that age was a factor in 
the effects of practice on the Toronto Com- 


plex Coordinator, a perceptual-motor task. 
On the other hand, Vernon (1954), sum- 
marizing research done on effects of practice 
on intelligence test scores, reported that there 
was evidence that gain in test scores upon 
retesting was substantially the same for 
different age groups. 

Analyses were made on Subsample A to 
determine the extent to which amount of 
gain in scores on two of the GATB aptitudes 
were related to other variables. In obtaining 
these relationships, the technique of part 
correlation (DuBois, 1957) was used. Appli- 
cation of this technique results in the cor- 
relation between an unmodified outside vari- 


TABLE 4 

















SS MS 
Aptitude P 
Total Error Interval 
(df = 894) | (df = 892) (df = 2) Error Interval 

G—Intelligence 56,218 56,189 29 62.99 14.50 aS 
V—Verbal Aptitude 20,862 17,783 3,079 19.94 1,539.50 To 
N—Numerical Aptitude 52,606 52,413 193 58.76 96.50 1.64 
S—Spatial Aptitude 12,667 12,661 Onmne 14.19 3.00 Al 
P—Form Perception 106,691 106,630 61 119.54 30.50 .26 
Q—Clerical Perception 84,210 83,022 1,188 93.07 594.00 6.38* 
K—Motor Coordination 19,727 19,284 443 21.62 221.50 10.24** 
F—Finger Dexterity 199,676 197,145 Desai 221.01 1,265.50 Sale 
M—Manual Dexterity 203,523 200,608 2,915 224.90 1,457.50 6.48* 








Note.—For convenience, analysis was based on raw scores for Aptitudes (V, S, Q, K) measured by only one test. 


*p < .05. 
“a daw Uy 


EFFECTS OF PRACTICE 


able, such as age, with the residual gain in 
scores on the aptitude under consideration, 
The residual gain represents the difference 
between actual final aptitude score and final 
score predicted from initial score. Age, edu- 
cation, and initial scores on Aptitude G 
(General Learning Ability) were correlated 
with residual gain on Aptitude M (Manual 
Dexterity). None of these correlations was 
significant. Age, education, initial scores on 
Aptitude M (Manual Dexterity) and Clerical 
Perception (Aptitude Q) were correlated with 
residual gain on Aptitude G (General Learn- 
ing Ability). None of these correlations was 
significant. Thus, although only limited 
analyses were made, there was no evidence 
that residual gain is related to other vari- 
ables. Further analyses of this type are 
required before making more definite con- 
clusions and generalizing to aptitudes other 
than Aptitudes G and M. 

The results of this study may be sum- 
marized as follows: 

1. There are significant increases in mean 
scores for all aptitudes even when the interval 
between testings is as long as 3 years. 

2. The size of the increases is a function 
of the interval between testings for Verbal 
Aptitude, Clerical Perception, Motor Co- 
ordination, Finger Dexterity, and Manual 
Dexterity. 

3. There is no evidence of deterioration 
in the size of the relationship between initial 
testing and retesting for any of the aptitudes 
over the time period covered by this study. 

4, There is evidence that level of ability 
is a factor in the size of increase for Numeri- 
cal and Spatial aptitudes. 

5. “Residual gain” appears not to be re- 
lated to variables such as age, years of edu- 
cation, or other aptitudes but further study 
is needed. 


DISCUSSION 


Although there are significant increases in 
scores upon retesting after 1, 2, or 3 years 
for all aptitudes, there is no evidence of any 
deterioration in reliability of measurement 
over the time span studied. This finding, 
together with the fact that the obtained reli- 
ability coefficients compare favorably with 
those in similar studies with a shorter interval 
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between testings, has a practical implication. 
That is, retesting an individual with aptitude 
tests will generally be unnecessary unless he 
is exposed to training or experience that 
would be likely to affect his aptitudes. In 
such cases, of course, it is desirable to retest, 
making appropriate adjustments for effects 
of practice. 

The results show that practice effect is a 
function of length of interval between testing 
for certain of the aptitudes. For example, the 
size of the practice effect for Clerical Percep- 
tion decreases as the interval between test- 
ing increases. Verbal Aptitude also shows 
significant variation in size of increase in 
mean score for the three intervals. But for 
Verbal Aptitude the tendency is for the gain 
to increase as the interval increases. A pos- 
sible explanation is that there is slight, but 
real, increase in vocabulary level over the 
time span of the study, quite apart from 
practice effects. This would be in line with 
the finding (Droege, 1963) that vocabulary 
level tends to increase slightly with age for 
adults. 

The influence of factors such as training, 
experience, etc., in the results cannot be ruled 
out entirely even though an attempt was 
made to restrict the sample to those not 
receiving formal training. An unpublished 
study by Jerome Moss, Jr. of the University 
of Minnesota indicates that experience and 
training may be more important than prac- 
tice in samples exposed to significant training 
or new experience related to the aptitude 
test tasks, 


FURTHER RESEARCH 


Employment Service research is in progress 
to determine the effects of previous exposure 
to the GATB upon retest scores when the 
interval between testings is less than 1 year. 
There are three reasons for this research: 

1. In recent test research with individuals 
enrolled in training programs there have been 
situations where all or part of the GATB is 
readministered within a year’s time. It is 
important in this research to be able to dis- 
tinguish between the effects which previous 
exposure to the GATB have upon retest 
scores from the effects of basic education 
and other training to which an individual 
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may be exposed during the interval between 
testings. 

2. Since an individual’s aptitude scores 
may be significantly affected by substantial 
academic or vocational training, in some in- 
stances it may be desirable for individuals 
initially tested with the GATB before train- 
ing to be retested with an alternate form 
after training. This is especially needed for 
the educationally disadvantaged individuals 
who receive basic education and/or voca- 
tional training at some time after initial test- 
ing on the GATB. The information obtained 
from this study on the extent to which pre- 
vious exposure to the GATB may result in 
increases in aptitude scores upon subsequent 
retest will be used in making the appropriate 
adjustments for such retest scores. 

3. The two reasons above are the main 
ones for conducting this study. A third reason 
is to be able to determine the appropriate 
adjustments for retest scores for those indi- 
viduals who need to be readministered the 
GATB for other reasons. 

Hopefully, in addition to the specific kinds 
of information outlined above, it will be pos- 
sible to gain some insight into the underlying 
reasons for increases in GATB retest apti- 
tude scores: the extent to which these in- 
creases are the result of practice and there- 


RoBEerRT C. DROEGE 


fore tend to decrease in relation to the length 
of the time interval elapsing between initial 
testing and retesting, and the extent to which 
they are the result of familiarization with 
the testing situation, with a particular type 
of test, or with techniques for taking a test, 
and therefore less dependent on the time 
variable. 
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CARTOON REACTION SCALE WITH SPECIAL 
REFERENCE TO DRIVING BEHAVIOR? 


THEODORE KOLE anp HAROLD L. HENDERSON 
Drivers Safety Service, Inc., New York City 


Research suggests that hostility, aggression, and other personality traits may 
be indicative of problem drivers. Other evidence implies that certain types of 
humor are related to these personality dimensions, A Cartoon Reaction Scale 
(with “funniness” response choices) was developed to test the hypothesis that 
problem and nonproblem drivers would respond differentially and to a sig- 
nificant degree. Out of an original pool of 150 cartoons, 34 cartoons achieved 
discriminatory ability. These cartoons were subsequently administered to 
new groups of drivers. Reliability coefficients ranged from .77 to .80. Valida- 
tion and cross-validation achieved significance beyond the .01 level. The 
test’s success in separating controls from problem drivers was also demon- 


strated by a cut-off score. Its predictive ability has not yet been shown. 


Over the years many different approaches 
have been developed to meet the problem of 
death and injury on our highways. These 
attempts to develop methods for preventing 
highway accidents relate to one of three 
major areas: the host, the agent, or the en- 
vironment, 

The driver or the host has received a good 
deal of attention in recent years. Goldstein 
(1961) compiled a list of research projects 
which summarized measures or tools, the 
criterion measures, and the validity indices 
for each predictive measure. Predictors in- 
cluded paper and pencil attitude scales, stand- 
ardized personality instruments, and a variety 
of psychophysical measures. The results, al- 
though far from conclusive, show that per- 
sonality and attitude undoubtedly affect driv- 
ing behavior. 

Therefore, the investigation now reported 
was undertaken to develop a valid and reli- 
able test for discriminating between problem 
and nonproblem drivers; a test which can be 
easily and objectively administered and 
scored. 


Personality and Humor 


Personality attributes that seem to be in- 
dicative of problem drivers, or at least those 


1This paper represents a portion of a doctoral 
dissertation by the senior author submitted to New 
York University, 1964. This manuscript is only 
slightly changed from the paper presented at the 
Eastern Psychological Association, Atlantic City, 
April 1965. 
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results most frequently reported, appear to be 
those associated with anxiety (Brody, 1957; 
Henderson & Kole, 1963); impulsiveness 
(Heath, 1957; Wisely, 1947); frustration 
(IPAT, 1960; McGuire, 1955); intolerance 
(McFarland et al., 1955; Schuster & Guilford, 
1962); and aggressive behavior (Cowley, 
1946; Goldstein & Mosel, 1958; Rommel, 
1959). Although a variety of psychological 
tools and techniques have been used, the 
major emphasis has been on “paper and 
pencil” inventories and scales. Although ex- 
pressive or projective techniques have been 
rarely used (Hakkinen, 1958) attempts have 
been made to attain greater objectivity of 
different projective tools (Cattell, 1957; 
Kole, Henderson, & Roland, 1965). 

Humor, imbedded in projective techniques 
has been given more attention recently and ° 
has been successfully used in personality as- 
sessment (Cattell & Luborsky, 1952; Elbert, 
1961; LaGaipa, 1960). A natural extension of 
the use of jokes in personality testing is 
through the utilization of cartoons. Several 
studies reported that cartoon tests have ad- 
vantages over “traditional” humor _ tests 
(Byrne, 1958; Elbert, 1961; Grzwiok & 
Scodel, 1956; More & Roberts, 1957; Red- 
lich et al., 1951) which include simple and 
objective scoring procedures (Doris & Fier- 
man, 1956; Malpass & Fitzpatrick, 1959), 

Aggression and aggressive tendencies seem 
to be the most easily (or most often) tapped 
personality correlates (Byrne, 1956; Levine, 
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1956; Murray, 1934). Since it has been re- 
ported that this personality trait is often 
found in the problem driver, it seemed possi- 
ble that a cartoon test of humor might better 
predict poor driving behavior than other 
forms of paper and pencil testing. 


METHOD 


Cartoon Reaction Scale 


Cartoons were collected from a great variety of 
sources but the greatest majority of the cartoons 
were actually taken from the Travelers Insurance 
Company Handbooks, for the years 1949 to 1961, 
inclusive. These booklets offered a wide range of 
motoring-type cartoons by some of the more popu- 
lar cartoonists. 

Each cartoon was selected as meeting these six 
criteria: (a) an accompanying commentary; (b) no 
foreign words or phrases; (c) was reproducible in 
black and white; (d) focused on caricatures of peo- 
ple, rather than animals or fictional or whimsical 
subjects; (e) depicted adults of driving age, rather 
than small children; and (f) included only situations 
where the automobile was present or implicit. These 
cartoons thus included driver and/or pedestrian 
behaviors, law enforcement situations, accidents, 
driving skills, the effects of alcohol, etc. 

The original cartoons were individually repro- 
duced and became Cartoon Reaction Scale, Form 
A. It included 150 cartoons, an instruction sheet, 
and an answer sheet. In a scale which uses cartoons 
(such as the Cartoon Reaction Scale), statements 
regarding the extent of agreement with the cartoon 
(Likert-like) rather than statements concerned with 
the degree of “funniness” as perceived by the re- 
spondent, would be, in effect, an entirely different 
test. Thus, on the basis of the reported works of 
several other researchers (Byrne et al. 1961; Doris 
& Fierman, 1956; Elbert, 1961; Roberts & Johnson, 
1957; Strother et al., 1954), the scale used was: VF; 
F; N; NF; and UF for I think it’s very funny; I 
think it’s funny; It’s neither funny nor unfunny; I 
don’t think it’s funny; and I don’t think it’s funny 
at all. 


Design and Procedures 


The experimentals were “point system violators.” 
These samples were drawn from that population of 
legally licensed drivers of private motor vehicles in 
the State of New Jersey who had been called in to 
show cause why their driving privilege should not be 
suspended. Generally, these problem drivers had 
accumulated a minimum of 12 points over a 3-year 
period. i 

Control groups consisted of legally licensed motor 
vehicle operators of the State of New Jersey who 
had driving records free of traffic violations for 3 
years prior to the date of testing on the Cartoon 
Reaction Scale. 

All subjects (Ss) were males (the driving exposures 
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of males and females are not comparable), between 
the ages of 18 and 34, who had at least a high 
school education; driving exposure was limited to 
21,000 to 39,000 miles over a 3-year period, Six 
groups of 50 Ss each were used; Experimental I 
and Control I were utilized to determine discrimi- 
nation cartoon by cartoon (item analysis); Experi- 
mental II and Control II were used to estimate 
reliabilities and test validity; cross-validation was 
performed with Experimental III and Control Il. 

All Experimental Ss were tested in Trenton at the 
Driver Improvement Bureau of the Division of Mo- 
tor Vehicles. The Control Ss were tested in small 
groups varying in size, with a maximum of 50 uti- 
lized at any one time. The Ss were obtained from 
colleges, private corporations, various state agencies, 
and civic associations. 


RESULTS 


The ¢ test of the significance of the differ- 
ence between means was the statistic utilized 
to test the hypothesis that the sample groups 
were drawn from the same population. Since 
the obtained ¢ value on the variables of age, 
education, and driving mileage were, in no 
instance, as great as 1.96, the null hypothesis 
that there is no difference between the groups 
is accepted. The groups may be assumed to 
have been drawn from the same population 
with respect to these three variables and the 
slight differences obtained may be attributed 
to chance. 

The cartoons were item analyzed by a 2 X 
5 chi-square design; of the original pool of 
150 cartoon items, 34 cartoons attained a 
5% or higher level of significance. These 
became Form B of the Cartoon Reaction 
Scale. 


Scoring the Scale 


Using a Likert-type approach to scoring 
the cartoons, responses were assigned integers 
of five, four, three, two, and one for the re- 
sponses of Very Funny, Funny, Neither 
Funny nor Unfunny, Not Funny, and Un- 
funny. However, in two instances, this scor- 
ing procedure was reversed, since 75% of the 
Control group had selected the negative side 
of the scale. Although there was a theoretical 
range of scores between 34 and 170, the Ex- 
perimental I group ranged between 48 and 
131 and the Control I group ranged between 
67 and 148, 


CARTOON REACTION SCALE AND DRIVING BEHAVIOR SiS 


TABLE 1 


STATISTICAL RESULTS WITH THREE DIFFERENT EXPERIMENTAL AND CONTROL GROUPS 











Age Years of education Mileage driven® 

Group x sd t ne sd t Exe sd t 
CI 24.78 4.48 13.10 1.14 29.72 5.69 

1.035 .636 .639 
EI 28.33 4.75 12.96 1.08 30.50 6.38 
Ga0L 25.08 4.19 13.36 1.67 29.22 5.23 

1.718 1.930 910 
E II 23.58 4.45 13a 1.41 31.14 4,75 
C Ill 25.16 5.01 13.38 1.63 30.02 5.14 

1.584 1.170 1.18 
E Til 23.48 5.48 13.00 1.54 gi 25 4.24 





Note.—WN = 50 for each group. 
a Thousands of miles for 3 years. 


Reliability 


Two measures of reliability were obtained 
for the Cartoon Reaction Scale. The first was 
a split-half internal consistency coefficient 
which rose to .77 when treated by the Spear- 
man-Brown prophecy formula. The coefficient 
of stability on a retest administration after 
2 weeks was .80. 


Validity 


The research hypothesis that problem and 
nonproblem drivers will respond differentially 
and to a significant degree to a cartoon-type 
instrument was tested by demonstrating the 
concurrent validity of the Cartoon Reaction 
Scale, by application of the Median test. 


TABLE 2 


RELIABILITY INDICES FOR FORM B OF THE 
CARTOON REACTION SCALE 





This method was selected because it is in- 
sensitive to dispersion of scores (Smith, 1953), 
and it is easy to apply. It conveniently sepa- 
rates the two groups at the median score for 
the combined frequency distribution. Since it 
is actually a 2 X 2 chi-square test, its func- 
tions are easily understood (Moses, 1952). 

The Ss used for validity testing were the 
Control II and Experimental II groups. The 
median for the two groups was 111.21. The 
results of the Median test indicated signifi- 
cance at the .01 level of probability. Classical 
descriptive statistics are also of interest here. 
The mean score for Control II was 116.10, 
with a standard deviation of 16.35; the Ex- 
perimental II group had a mean of 104.70, 
and a standard deviation of 14.40. 


TABLE 3 


Driver ASSIGNMENT TO GROUPS BASED ON SCORES 
ON THE CARTOON REACTION SCALE 














Test-retest® Group Above 112 111 0rbelow All scores 
Split-half a i Control II 34 16 50 
i" First Second Experimental IT 14 36 50 
First Second adminis- adminis- oy mia es 
half half tration tration Both Groups 48 52 100 
M 51.48 53.04 114.65 107.80 Score Correct Percent Percent in 
SD 7.32 8.07 16.70 19.25 group predictions correct total group 
r 62 .80 Above 112 Control 34 70.8 50.0 
Yt, ith 1110rbelow Experimental 36 69.2 50.0 
FF 70 70.0 100.0 


« Experimental II, N = 50. 
b Most of Control II, N = 44. 





TABLE 4 


A COMPARISON OF STATISTICAL MEASURES OBTAINED AS 
RELATED TO SCORES ON THE CARTOON REACTION 
SCALE FOR THE CONTROL III AND 
EXPERIMENTAL III Groups 


Group M SD t p 





Control III 117.20 1725 


Experimental III 104.20 14.95 





Cross Validation 


Further evidence in support of the hy- 
pothesis was provided by Experimental III 
and Control III. The level of significance at 
which the Cartoon Reaction Scale discrimi- 
nated between these two groups was deter- 
mined by using the ¢ test. The mean and 
standard deviation of Control III were 117.40 
and 17.25, respectively; the Experimental III 
group obtained a mean and a standard devi- 
ation of 104.20 and 14.95. The ¢ value ob- 
tained was significant at the .006 level of 
probability. This provides further evidence of 
the scale’s concurrent validity. 

A cutting score of 110 (all persons 109 
and below are rejected) is suggested on the 
basis of an inspection of the data thus far 
collected. With this score, 75 of the 100 
“good drivers” are properly identified, and 
64 of the 100 “poor drivers” are appropri- 
ately rejected. Thus there are 64 “hits” and 
36 “misses.” Thirty-one of the validation 
experimental group and 33 of the cross-vali- 
dation experimental groups are “‘hits’”—they 
have the weakness indicated; thus, only 19 
and 17  experimentals, respectively, are 
“misses.” Only 25 Ss from both control groups 
would be labeled “false-positives”; that is, 
being incorrectly identified as “poor drivers.” 


DISCUSSION 


This research demonstrates that a dis- 
guised projective test of humor, utilizing car- 
toon driving- situations, was able to distin- 
guish between errant motorists and drivers 
with good records. 

A word of caution is warranted. This is a 
preliminary study where positive results may 
have been obtained as a result of (@) the 
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tests being given to experimental Ss as part 
of a driver improvement course or (0) the 
experimental Ss, having had several driver 
failures, may have depressed their sense of 
humor insofar as cartoons related to driving 
are concerned. Furthermore, the ability of 
this instrument to predict in advance the 
manifestation of good or bad driving behay- 
iors is yet to be demonstrated. 


FURTHER RESEARCH 


A number of basic and applied research 
projects could develop from this study. 

1. The scale could be used in high schools 
and in driver-education classes so as to obtain 
measures for these students before and follow- 
ing training and in the absence of driver edu- 
cation. The comparability of student groups 
could be demonstrated or denied and the 
strength and direction of attitude changes 
may be useful in identifying students who 
may be in need of special attention. 

2. The Cartoon Reaction Scale might use- 
fully be included among the devices used in 
screening applicants for positions as drivers 
with business, commercial, industrial, and/or 
military establishments with a view to helping 
select operators with a greater potential for 
safe driving. 

3. The scale could be pilot tested as part 
of current traffic safety programs in state and 
local driver improvement programs for assess- 
ing change after various counter-measures 
have been tried. 

4. The scale should be administered to fe- 
male populations to determine whether a new 
scale is needed or whether only a standard 
correction factor or special norms are neces- 
sary. 

5. The scale, in combination with generally 
accepted measures of hostility and aggression, 
should be administered to a variety of of- 
fender and nonoffender populations. Correla- 
tions and factor analysis should be carried 
out to determine the relationships between 
traffic-safety humor generally, or specific car- 
toons, and aggression and hostility. 


SUMMARY 


Recent research into the causes of accidents 
has followed an epidemiological model and the 
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emphasis has been upon the driver. There is 
some evidence that his personal and emo- 
tional make-up is responsible, rather than the 
environment (roadways) or the agent (the 
vehicle). Accordingly, this investigation was 
undertaken to determine whether problem 
and nonproblem drivers will respond differ- 
entially and to a significant degree on a spe- 
cially constructed cartoon-type instrument. 

Initially, 150 cartoons dealing with traffic 
safety were collected, duplicated, and bound 
into booklet form. A scoring system based on 
a Likert-type procedure was developed with 
a theoretical score range of 34 te 170. This 
form was administered to 50 male problem 
drivers (who had accumulated 12 or more 
points in the past 3 years under the point 
system) and 50 male nonproblem drivers (no 
violations or accidents during this same pe- 
riod) in New Jersey who were comparable as 
to age, education, and driving exposure. 
Thirty-four cartoons discriminated between 
the two groups at the .05 level of confidence. 

These functional cartoons and six buffer 
items were then collated as the 40-cartoon 
research instrument. The Cartoon Reaction 
Scale was administered to four new groups 
of New Jersey drivers; two control and two 
experimental groups. A split-half coefficient 
and a test-retest index were the two measures 
of reliability which were obtained. The co- 
efficient of consistency for an experimental 
group was .77, and the coefficient of stability 
was .80 for a control group. 

The psychological hypothesis, that prob- 
lem and nonproblem drivers will respond dif- 
ferentially to a significant degree on the Car- 
toon Reaction Scale was tested by applica- 
tion of the Median test. Significance beyond 
the .01 level of probability was obtained. 
The ¢ test was applied as cross-validation 
utilizing a new group. 

The Cartoon Reaction Scale may prove of 
value in the following situations: (a) the 
screening of new license applicants, (0) the 
identification of drivers who may be in need 
of remedial or rehabilitative attitudinal 
training, and (c) the measurement of or 
change in traffic safety attitudes under educa- 
tional programs. However, present results 
must prove to be replicable. 
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HYGIENES UNIDIMENSIONAL?* 
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The purpose of this investigation was to test the assumption that Herzberg’s 
2 classes of factors affecting job satisfaction and dissatisfaction (motivators 
and hygienes) represent unidimensional constructs. 187 female and male col- 
lege students ranked the importance of 5 motivators and 5 hygienes. The un- 
folding technique in 1 dimension developed by Coombs (1964) was then 
applied to these preference orders. The results clearly indicated the absence of 
a unidimensional attribute underlying both the motivators and the hygienes 
and suggests that Herzberg’s 2-factor theory may be an oversimplified rep- 
resentation of job satisfaction. A literature review also showed that these 2 
factors may not be independent. Nevertheless, the basic distinction between 
intrinsic job characteristics and environmental job characteristics seems to be a 


useful one for purposes of research. 


A theoretical framework suggesting a dual 
approach to job motivation has been pro- 
posed by Herzberg, Mausner, and Snyderman 
(1959). Focusing specifically on the motiva- 
tion of accountants and engineers, they uti- 
lized a semistructured interview method in 
which the interviewee recalled two incidents, 
one satisfying and the other dissatisfying, 
from his employment experiences. An a pos- 
teriori content analysis indicated that certain 
job characteristics were important for and 
led to job satisfaction (but not to job dis- 
satisfaction), while other job characteristics 
were important for and led to job dissatis- 
faction (but not to job satisfaction). Herz- 
berg and his colleagues referred to the 
characteristics which produced satisfaction as 
motivators and to the characteristics which 
produced dissatisfaction as hygienes. In gen- 
eral, motivators were those characteristics 
which satisfied the individual’s needs for self- 
actualization and self-realization in his work. 
These revolved around the need to develop 
in one’s occupation as a source of personal 
growth. Hygienes, on the other hand, tended 
to represent environmental factors descriptive 
of the job context. This group was associated 
with fair treatment in supervision, wages, 
and working conditions. 


1 This research was supported by USPHS Grant 
No. MH-02704, United States Public Health Service, 
Norman R. F. Maier, Principal Investigator. The 
author would like to thank Edward Smith for his 
assistance in analyzing the data. 
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A number of investigators have attempted 
to replicate and extend the generality of 
Herzberg’s theory with varying degrees of 
success. The results of these studies are sum- 
marized in Table 1. 

Sources of job satisfaction and job dis- 
satisfaction were usually measured by content 
analyses of interviews or written stories, 
questionnaires rating the importance of given 
job characteristics as sources of satisfaction 
and dissatisfaction, or factor analyses of this 
interview, written story, or questionnaire 
data. Although most of the individual studies 
investigated a narrow range of jobs, they 
represented attempts to replicate Herzberg’s 
findings with different worker populations in 
different job situations. 

The following conclusions about the nature 
of motivators and hygienes seem warranted. 

1. In many cases, factors causing job satis- 
faction (motivators) are different from, and 
not merely opposite to, factors causing job 
dissatisfaction (hygienes). This conclusion 
is supported by the original Herzberg et al. 
(1959) investigation and investigations of 
Friedlander and Walton (1964), Myers 
(1964), Saleh (1964), and Schwartz, Jenu- 
saitis, and Stark (1963). 

2. A given factor can cause job satisfaction 
in one sample and job dissatisfaction in an- 
other sample, and vice versa. It appears that 
job or occupational level (Dunnette, 1965; 
Friedlander, 1965; Myers, 1964; Rosen, 
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1963), age of respondents (Friedlander, 
1963; Saleh, 1964; Wernimont, 1966), sex 
of respondents (Myers, 1964), and _per- 
haps a_ time-dimension variable (Ewen, 
1964; Wernimont, 1966) partially determine 
whether a given factor will be a source of 
satisfaction or dissatisfaction on the job. 

3. In some cases a given factor was found 
to cause job satisfaction and job dissatisfac- 
tion in the same sample (Dunnette, 1965; 
Ewen, 1964; Gordon, 1965). 

The distinction between motivators and 
hygienes rests on the assumption that these 
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two factors are independent. These factors 
should, in addition, represent unidimensional 
attributes. Although the Herzberg study does 
not claim to have empirically established 
these facts, the evidence available suggests 
that the two factors may not be completely 
independent (Dunnette, 1965; Ewen, 1964; 
Friedlander, 1963, 1964; Wernimont, 1966). 
The purpose of this study was to subject the 
assumption that motivators and _ hygienes 
represent unidimensional attributes to quanti- 
tative analysis using a newly developed 
scaling criterion. 


TABLE 1 


SUMMARY OF INVESTIGATIONS ATTEMPTING TO REPLICATE OR EXTEND HERZBERG’S THEORY 


Investigator Subjects 


Procedure 


Findings 





Friedlander | Engineers, supervisors, and 


(1963) salaried employees of a large | questionnaire measuring the 
manufacturing firm (200 of importance of various job 
each) characteristics to employee 

satisfaction 

Rosen 94 research and development | Respondents rated the impor- 

(1963) personnel of varying speciali- | tance of the absence of 118 


ties, educational levels, and 
organizational levels 


Schwartz, 111 male supervisors employed 


Jenusaitis, | by 21 public utility companies | stories describing pleasant and 

and Stark unpleasant job experiences 

(1963) 

Ewen 1,021 full-time life insurance | Factor analysis of a 58-item 

(1964) agents divided into an experi- | attitude scale completed by 
mental sample (541) anda the experimental sample 
cross-validation sample (480) 

Friedlander | 80 students in an evening Respondents rated the impor- 

(1964) course in industrial or child tance of 18 variables to job 


psychology (part were full- 
time employees in various oc- 
cupations and part were mem- 
bers of a cooperative work- 
study program) 


faction 





Friedlander 


82 scientists and engineers in 
and Walton 


various specialties 


Factor analysis of a 17-item 


items to their desiring to leave 
their present position 


Content analysis of written 


satisfaction and job dissatis- 


Semistructured interviews in 
which respondents were asked 


Three meaningful factors emerged. Two corre- 
sponded, in part, with motivators and hygienes, 
while the third seemed to draw from both moti- 
vators and hygienes. 


Many of the most important items which if not 
present would cause the individual to seek other 
employment were similar to Herzberg’s moti- 
vators. 


Motivators were generally associated with pleasant 
experiences and hygienes with unpleasant experi- 
ences. One Herzberg motivator acted like a hygiene 
in this sample. 


Six interpretable factors emerged, of which three 
were hygienes and two motivators. Two of the 
three hygienes acted like motivators in both sam- 
ples; the other hygiene acted like a motivator in 
the cross-validation sample, and like both a moti- 
vator and a hygiene in the experimental sample. 
One motivator acted both as a motivator and a 
hygiene. 


The results indicated that motivators and hygienes 
are not opposite ends of a common set of dimen- 
sions. The majority of these job characteristics 
seemed to be significant contributors to both satis- 
faction and dissatisfaction on the job. 


Reasons for remaining in an organization (pri- 
marily motivators) were different from, and not 
merely opposite to, the reasons for which one might 
leave an organization (primarily hygienes). 





Two technological and three attitude factors 
emerged. The technological factors were different 
for the two samples, but the attitude factors corre- 
sponded rather well. Two of the three attitude 
factors resembled motivators and hygienes. 


Job characteristics grouped naturally into moti- 


(1964) for the most important factors 
keeping them in the organiza- 
tion and factors that might 
cause them to leave the organ- 
ization 

Lodahl 50 male auto-assembly work- | Factor analysis of data ob- 

(1964) ers, and 29 female electronics- | tained from a content analysis 

assembly workers of interviews 

Myers 282 male scientists, engineers, | Content analysis of Herzberg- 

(1964) manufacturing supervisors, like interviews 


and hourly technicians, and 52 
female hourly assemblers 








vator-hygiene dichotomies. However one Herzberg 
motivator acted like a hygiene and other Herzberg 
motivators acted both as motivators and hygienes. 
Different job levels had different job characteristic 
configurations. The female configuration was. dif- 
ferent from the four male configurations, suggest- 
ing a sex factor. Common Herzberg motivators 
were absent from the hourly technician and hourly 
female assembler configurations suggesting a job- 
level factor. 
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TABLE 1—Continued 











Investigator Subjects 


Procedure 


Findings 








Saleh 85 male employees at mana- 


Herzberg-like interview, and a 


Preretirees looking backward in their careers in- 
dicated motivators as sources of satisfaction and 
hygienes as sources of dissatisfaction; preretirees 
looking at the time left before retirement indicated 
hygienes as sources of satisfaction. 


Some Herzberg motivators were related to satis- 
fying job situations but Herzberg hygienes were 
not related to dissatisfying job situations. One 
Herzberg motivator acted like a hygiene. There 
was also a positive relationship between the im- 
portance of a factor as both a motivator and a 
hygiene contrary to the negative relationship ex- 
pected under Herzberg’s theory. Thus the same 
factors were contributors to both satisfaction and 
dissatisfaction. 


White-collar workers derived greatest satisfaction 
from motivators while blue-collar workers derived 
greatest satisfaction from hygienes suggesting that 
subgroups may have different work-value systems. 


Contrary to expectations, individuals highly satis- 
fied with motivators did not have greater overall 
job satisfaction than individuals highly satisfied 
with hygienes; and individuals highly dissatisfied 
with hygienes were not less satisfied than individ- 
uals dissatisfied with motivators. A positive rela- 
tionship was found between satisfaction with 
motivators and self-reported production, but no 
relationship between hygienes and production. 
This study offered no support to the theory that 
specific job factors effect attitudes in only one 
direction. Support is offered that primarily the 
motivators bring about superior performance. 


Although the respondents were equally- satisfied 
with both the motivator and hygiene aspects of 
their jobs, the motivators contributed significantly 
more to overall job satisfaction than did the 
hygienes. 


More motivators than hygienes were used to de- 


(1964) gerial levels in 12 companies 16-item job-attitude scale (6 
motivators and 10 hygienes) 
presented in a paired-compari- 
son format 

Dunnette 114 store executives, 74 sales | Factor analysis of Q sorts of 

(1965) clerks, 43 secretaries, 128 en- | two sets of 36 statements 

gineers and research scientists, | (equated for social desirability) 
46 salesmen, 91 army reserve | for highly satisfying and highly 
personnel and employed adults | dissatisfying job situations 
enrolled in a supervision course 

Friedlander | 1,468 civil service workers from | Factor analysis of a 14-item 

(1965) three status levels (Low, questionnaire measuring the 

Middle, and High GS rank- | importance of various job 
ings) and two occupational characteristics to satisfaction 
levels (blue collar and white] and dissatisfaction 
collar) 
Gordon 683 full-time agents of a large | Respondents rated their degree 
(1965) national, life insurance com- of satisfaction and dissatisfac- 
pany tion with 54 items comprising 
4 scales (motivators, hygienes, 
both, hygienes minus both). A 
measure of overall job satis- 
faction, self-reported produc- 
tion figures, and survival data 
were also available 

Halpern 93 male college graduates Rating of satisfaction with 4 

(1965) working in various occupations | motivators, 4 hygienes, and 
overall job satisfaction on re- 
spondent’s best-liked job 

Wernimont | 50 accountants and 82 Self-description of past satisfy- 

(1966) engineers ing and dissatisfying job situa- 


tions using both forced-choice 
and free-choice items 


scribe both job situations. Concludes that both 
motivators and hygienes can be sources of job 
satisfaction and job dissatisfaction. 





MrETHOD 


One hundred and eighty-seven college students 
(48 females and 139 males) enrolled in an intro- 
ductory industrial psychology course served as 
subjects (Ss).2 They were asked to rank 10 job 
characteristics in order of importance for self. Each 
S was given enough time to complete the task to 
his satisfaction. The 10 job characteristics were 
taken from Herzberg et al. (1959) and included 
five motivators and five hygienes. The motivators 
were: Challenges Ability, High Responsibility, Im- 
portance of the Job, Opportunities for Advancement, 
and Voice in Decisions. The hygienes were: Good 
Boss, Good Physical Working Conditions, Good 
Salary, Job Security, and Liberal Fringe Benefits. 
The 10 characteristics were placed in a random 
order and each S was given the same list to rank. 


2The data were collected in two semesters but 
have been combined for this analysis. The correla- 
tions for the male rank orders, and the female 
rank orders, across semesters were .94 and_ .80, 
respectively, 


RESULTS AND DISCUSSION 


Relative Importance of Motivators and Hy- 
gienes 


A mean importance ranking was computed 
for each job characteristic and a mean rank 
order determined for female and male rank- 
ings. These data are presented in Table 2. 
Measures of the degree of agreement within 
a rank order are also shown in Table 2. 
Although only moderately high, both co- 
efficients are significantly different from zero 
and indicate that, within a rank order, most 
individuals were applying essentially the same 
standard in ranking the 10 job characteristics. 
One interesting aspect of this table is the 
surprising degree of agreement in the female 
and male preferences. Only one job character- 
istic appears in the top five choices for 
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TABLE 2 


MEAN RANKINGS AND RANK ORDERS OF THE 
10 Jos CHARACTERISTICS 











Females Males 
(V = 48) (Vv = 139) 
Mean Rank Mean Rank 
Characteristic ranking order ranking order 
Challenges Ability 2.67 1 3.70 2 
Good Salary 4.12 2 3.91 3 
Opportunities for 4.17 3 3.50 1 
Advancement 
Good Boss eS 4 6.40 8 
High Responsi- 5.46 5 5.42 5 
bility 
Voice in Decisions 6.15 6.5 6.06 7 
Job Security 6.15 6.5 DEoe) 4 
Importance of the 6.23 8 So 6 
Job 
Good Physical 6.67 9 6.96 9 
Working Con- 
ditions 
Liberal Fringe 8.02 10 8.16 10 
Benefits 
Coefficient of 248 248 


Concordance 





Note.—Correlation between female and male ranking is .80, 
significantly different from zero at the .01 level of confidence. 

a Coefficient of Concordance, W, is significantly different 
from zero at the .01 level of confidence. 


males which is not in the top five choices for 
females. 

An analysis was then made to determine 
whether there was any tendency for females 
and males to rank motivators generally more 
important than hygienes. The number of 
motivators ranked more important than each 
hygiene in the 25 possible paired comparisons 
was computed for each individual. If the 
number of motivators ranked more important 
than each hygiene equaled the number of 
hygienes ranked more important than each 
motivator, the number of motivators ranked 
more important than each hygiene would be 
12.5 on the average, and vice versa. There- 
fore the greater the deviation from 12.5, the 
more important one type of job character- 
istic and the less important the other type. 
The mean number of motivators ranked 
above each hygiene by females and males is 
shown in Table 3. Both sexes ranked a 
significant number of motivators more im- 
portant than hygienes indicating the rela- 
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tively greater importance of motivators over 
hygienes. This is consistent with other in- 
vestigations (e.g., Gordon, 1965; Halpern, 
1965; Wernimont, 1966) reporting moti- 
vators more important than hygienes as con- 
tributors to both job satisfaction and job 
dissatisfaction. 


Dimensionality of Motivators and Hygienes 


The rank-order preferences for the moti- 
vators and the hygienes separately were re- 
covered for each individual. The unfolding 
technique in one dimension (Coombs, 1964, 
pp. 80-121) was then applied to these pref- 
erence orders. The 187 Ss generated 72 and 
60 different preference orders for motivators 
and hygienes, respectively. According to this 
model, the occurrence of more than one pair 
of mirror-image preference orders among a 
set of observed preference orders is sufficient 
to permit rejection of a hypothesized uni- 
dimensional attribute common to a set of Ss 
with respect to their preferences for a set of 
stimuli. Since there were 22 mirror-image 
pairs in the preference orders for motivators, 
and 9 mirror-image pairs in the preference 
orders for hygienes, it is clear that the moti- 
vators and the hygienes used in this investi- 
gation did not represent unidimensional 
attributes for a sample of college students. 

Thirty-one unidimensional solutions (22 for 
the motivator preference orders and 9 for the 
hygiene preference orders) were then at- 
tempted to see if any one solution might 
satisfy a majority of the obtained individual 
preference orders. However, in no case did 
any one of these solutions satisfy as much 
as 20% of the individual preference orders. 

The results of this and other studies sug- 


TABLE 3 


Mean NuMBER OF Times EAcH MOTIVATOR RANKED 
More IMporTANT THAN Eacu HYGIENE 





Females 





Males 
(V = 48) (NV = 139) 
M SD M ED) 
15.358 5.60 15.768 5.83 





a The ¢ value for difference between the obtained means and 
an expected value of 12.5 is significant at the .01 level of 
confidence. 
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gest that Herzberg’s motivators and hy- 
gienes are neither unidimensional nor inde- 
pendent constructs. However, this does not 
mean that the distinction between. factors 
revolving around opportunities for  self- 
actualization on the job and factors revolving 
around the social and technical environment 
of the job is not an important one. Other 
investigators have found this distinction 
useful in accounting for stereotyped percep- 
tions of members of the same sex and the 
‘opposite sex (Burke, two studies in press), 
and for differential attitudes toward job re- 
tirement (Saleh & Otis, 1963). New perspec- 
tives for personnel and manpower adminis- 
tration programs (Herzberg, 1964) and 
mental health (Herzberg & Hamlin, 1961) 
based on motivation-hygiene concepts have 
also been proposed. 
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INCREASING TEST VALIDITY BY CONSIDERING 
INTERITEM CORRELATIONS* 


RICHARD B. DARLINGTON anp CAROL H. BISHOP? 


Cornell University 


Several investigators have proposed item-selection methods which construct a 1st- 
stage test consisting of the most valid items, then a 2nd-stage test by adding to the 
ist-stage test items which are moderately valid yet which correlate low with the 
1st-stage test. Several proposed indices for selecting 2nd-stage items were com- 
pared, and some found noticeably better than others. A 3rd-stage test was found 
noticeably better than a 2nd-stage test, but a 4th-stage test was found no better 
than the 3rd-stage test. A method which adds several items to form each new stage 
was found superior to a method which adds only 1 item. The best method con- 
structed tests substantially better on cross-validation than methods which ignore 


interitem correlations. 


One of the most fundamental problems in 
psychometric practice is the problem of weight- 
ing the items in an item pool to form a test 
which correlates maximally with a given 
criterion variable. It is well-known that this 
correlation is maximized in the test-construc- 
tion sample by a test constructed by weighting 
each item in the item pool by the multiple- 
regression technique. However, the multiple- 
regression method has two principal dis- 
advantages in relation to other weighting 
methods. First, when the number of items is 
large relative to the number of people in the 
test-construction sample, then the validity of 
the resulting test in a second, or cross-valida- 
tion, sample is often found to be extremely low. 
This is true despite the fact that the validity 
in the original sample is necessarily higher than 
with any other linear technique. Second, when 
the number of items is large, the computations 
become extremely complex, even by the 
standards of today’s most modern computers. 

The multiple-regression method uses three 
sets of statistics to assign item weights: the 


1The authors are grateful to Paul E. Meehl, the 
senior author’s thesis director in much of the work re- 
ported in this paper, to the USPHS for fellowship 
support for both authors, and to the computing centers 
at the University of Minnesota and Cornell University 
for free consultation and computer time. 

2 The work reported in the last section of this paper 
was done partly at the University of Minnesota by the 
senior author and partly at Cornell University by the 
junior author. The earlier sections are an abridgment 
of the doctoral dissertation of the senior author at the 
University of Minnesota (1963). Both authors partici- 
pated in the preparation of the manuscript. 


322 


standard deviation of the criterion and each 
item, the correlation between each item and the 
criterion, and the correlation between each 
pair of items. Because of the disadvantages of 
the multiple-regression method, tests are 
often constructed ignoring correlations between 
items, and assigning item weights solely as a 
function of the validity (correlation with the 
criterion) and standard deviation of each item. 

The present paper considers methods of test 
construction which make some use of interitem 
correlations and yet which are designed to 
avoid the disadvantages of the multiple-re- 
gression method. Methods which fit this de- 
scription have been proposed by Flanagan 
(1936), Gleser and DuBois (1951), Gulliksen 
(1950), Horst (1934, 1936), Richardson and 
Adkins (1938), Thorndike (1949), oops 
(1941), Webster (1956), and Wherry and Gay- 
lord (1946). All these methods are fairly 
similar to each other. Each method starts by 
constructing a preliminary test consisting of 
the item or items with the highest validity, 
ignoring interitem correlations. This prelimin- 
ary test is then improved by computing for 
each item in the item pool an index of the 
item’s ability to raise the validity of the 
preliminary test, and then adding to the test 
the item-or items for which this index is highest. 
The “second-stage” test thus resulting is then 
further improved by computing for each item 
in the item pool an index of the item’s ability 
to raise this test’s validity. The item or items 
with the highest index values are added to the 
second-stage test to form a third-stage test. 
This process can be repeated any number of 
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times to form a fourth-stage test, a fifth-stage 
test, etc. 

We shall refer to the index of an item’s 
ability to raise the validity of a preliminary 
test as the index of the item’s ‘‘usefulness.” 
An item capable of raising a test’s validity is 
a “useful” item; others are ‘‘useless.”? The 
index of usefulness is a function of the item’s 
validity and its correlation with the prelimin- 
ary test. Generally, an item’s index of useful- 
ness increases with increasing item validity 
and decreases as the item-test correlation in- 
creases. Subsequent stages of the test-con- 
struction process will be termed ‘“‘iterations.” 
Each iteration consists of calculating indices 
of item usefulness, selecting the most useful 
item or items, and adding them to the previous 
test. 

The methods proposed by the investigators 
listed above differ primarily in four respects: 
(az) in the specific index of item usefulness 
recommended, (b) in whether unit weights or 
differential weights are used, (c) in whether 
items are ever removed from a test when once 
added, and (d) in whether one or several items 
are added or removed at each iteration. The 
present investigation dealt with two of these 
four questions and with two others. Specifi- 
cally, the purposes of the investigation were 
(a) to compare the accuracy of the various 
indices of item usefulness (both those in the 
literature and a new index), (b) to compare a 
method which adds several items per iteration 
to one which adds one item per iteration, 
(c) to determine how many iterations are 
useful, and (d) to evaluate the technique of 
considering interitem correlations as a whole 
in comparison to methods of test construction 
ignoring interitem correlations. The compari- 
son of indices of item usefulness was primarily 
mathematical, with incidental empirical sup- 
port; the remaining problems were attacked 
empirically. 

The investigation was confined to the use of 
unit item weights, which are highly preferable 
to differential weights in economy of test 
scoring. Although the problem is thus one of 
item selection, it will often be referred to as 
item weighting, since item selection is merely 
a special case of item weighting in which the 
only permissible weights are zero and unity. 
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Tue INDEX OF ITEM USEFULNESS 
The Sign of the Optimum Item Weight 


Any item can be weighted either positively 
or negatively. Unless the item validity is 
exactly zero, one of these weights will result 
in a negative validity. This fact is significant 
because it has long been known that the opti- 
mum item weight is not always that which 
results in a positive validity for the item. It 
sometimes happens that items with positive 
validities should be given negative weights, 
and vice versa. (Horst, 1951, discusses this 
point.) Although attempts to give a reason- 
able psychological interpretation to this mathe- 
matical fact usually center about the concept 
of suppressor variables, a purely mathematical 
demonstration of the point follows most simply 
from multiple-regression theory. Consider the 
two-variable multiple-regression equation in 
which a test ¢ and an item 7 are used to predict 
a criterion c. It will be recalled that the 
multiple-regression weights for ¢ and 7 are the 
weights which will maximize the correlation 
between c and a weighted composite of ¢ and 
7 in the test-construction sample. As usual, 
let r designate a correlation coefficient and s 
a standard deviation. From Hays (1963) or 
other standard statistical works, the multiple- 
regression weight of 7 is 

Tei — Vetlit Se 
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[1] 


Except in trivial cases, a standard deviation 
is always positive and a correlation coefficient 
is always less than unity. Therefore, the right 
side of [1] is always positive, as is the de- 
nominator of the left side. Hence, whether [1 ] 
is positive, negative, or zero depends on the 
quantity 


[2] 


which always has the same sign as [1] as a 
whole. We thus conclude that the sign of [2 ] 
is the proper sign of the item in question. 
Inspection of [2 | shows that it can be negative 
when 7,; is positive, and positive when r,; is 
negative. This proves the point we sought to 
establish. 

We thus conclude that the proper sign for 
an item weight is the sign of [2]. This fact 
becomes important in a discussion of indices 


ei — Vetlity 
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of item usefulness, since an item which appears 
useless when given a positive weight often is 
found to be useful when given a negative 
weight. Of the investigators listed in the in- 
troduction to this paper, only Wherry and 
Gaylord (1946) are entirely explicit on this 
point; the others considered only one direction 
of item scoring. 

There is an alternate way of stating the rule 
just mentioned that items for which [2] is 
negative should receive a negative weight. 
Assigning a negative weight to an item is 
equivalent to reversing the direction the item 
is scored, for example, considering ‘““No” in- 
stead of “Yes” as a “positive” response. Re- 
versing the scoring direction reverses the sign 
of any correlation involving the item. Reversal 
thus changes the signs of rv; and r.;, and 
therefore changes the sign of [2]. Thus, as- 
signing a negative weight to an item for which 
[2] is negative is equivalent to reversing the 
scoring direction and thereby making [2] 
positive. Hence, the rule states that an item 
should be scored in the direction which makes 
[2 ] positive. 


The Number of Useful Items 


Inspection of [2] shows that it is not 
normally exactly zero, so optimum item 
weights are not normally exactly zero in 
multiple-regression equations, which allow 
fractional item weights. When only unit 
weights are used, however, Gleser and DuBois 
(1951) showed that there may be a number of 
items whose optimum fractional weights are 
so close to zero that the items are best as- 
signed zero weights when the only alternative 
is a unit weight. They derived an index for 
identifying these items. 

One would expect that the relatively minor 
change from fractional weights to unit weights 
would not greatly lower the number of useful 
items. A preliminary empirical study showed 
this to be the case, when both possible direc- 
tions of item scoring are considered. In each of 
three instances, well over half of the items in 
the item pool were found useful by the Gleser- 
DuBois criterion. Unfortunately, this fact 
destroys the value of the criterion for this type 
of test construction. For example, in an item 
pool of 550 items, use of every item which 
passes the Gleser-DuBois criterion might 
involve adding 400 items to a 50-item pre- 
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liminary test. An item which would be useful 
when added alone might very well not be 
useful when added along with 399 other items. 

It therefore becomes apparent that the index 
of item usefulness should not only discriminate 
useful items from useless items, as the Gleser- 
DuBois index does, but also should permit 
selection of the few most useful items. Using 
such an index, a small number of the most 
useful items can be added to the preliminary 
test. 


A Comparison of Indices of Item Usefulness 


In this section a standard for evaluating an 
index of item usefulness will be suggested, and 
various previous indices and a new index will 
be compared to this standard. 

We begin with two formulas basic to much 
work in statistical theory. Although they can 
be expressed in terms of any variables, they 
are here expressed in terms of c, ¢, and 2, since 
these are the three variables of present interest. 
Letting cov stand for covariance and var for 
variance, we write the formulas as: 


cov[c(t + i) ] = cov(ct) + cov(ci) [3] 

var(t + i) = var(é) + var(i) + 2cov(i#). [4] 

The test (¢ + 7) is test ¢ after item 7 has been 

added to it. The standard formula for the 

correlation between the two variables c and 
(t + i) is 

ok cov[.c(é + 2) ] 37 

Fee [var(c)*var(¢ + 7) 


Substituting [3] and [4] into [5], we have 





T c(t+i) 
cov (ct)+-cov (ci) 


~ [var (c)#[var (t)+var (i)+ 2cov (it) } 
[6] 





Using the general relation 
a 


[6] can be rewritten as 
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By assuming values for s,, si, and re, [8] 

can be used to study the effect of re: and ri 

OD Fe(t4i)- 
Let us assume that ¢ is a test with nine 
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independent items of unit variance so that 
s; = 9. Assume also that 7 has unit variance, 
and that 7., = .6. For these assumptions, the 
upper two lines in Figure 1 show combinations 
of r.; and 7; which result in given values of 
re(t+i). Although seven points were plotted for 
each curve in Figure 1, it is apparent that the 
curves are very nearly straight lines. 

The curve for r,; — rei: = 0 has also been 
drawn in Figure 1. When scored in the proper 
direction, all items will fall on or above this 
ine, as described earlier. 

The two r+) curves shown are members 
of an entire family of curves, which lie one 
above the other. The accuracy of an index of 
tem usefulness is measured by the degree to 
which items with the same index value fall on 
the same fect4i) Curve. 

As we have seen, most of the methods of test 
‘onstruction mentioned in the introduction do 
10t consider the possibility that an item may 
9e most useful when scored in the direction 
siving it negative validity. From what we have 
aid, however, there is still the possibility that 
he indices used in these methods will be ac- 
urate if computed twice for each item, once 
living the item a positive sign and once a 
legative sign. A comparison with Figure 1, 
lowever, shows that several indices are sub- 
tantially inaccurate, even when items are 
cored in both directions. Horst (1936), for 
xample, scoring all items in the direction 
iving positive validity, suggested selecting 
ll items in the upper left quadrant of Figure 
, plus all those in the upper right quadrant 
or which 

[9] 


s above some arbitrary value. Gulliksen (1950) 
rrived at the same recommendation, although 


Yei/ Tit 








Fe(tsi) = 6767 


Ne 
NN Sh eet wae 
1O-9 -8-7-6-5 -47~3-2 —) te AVS Ge 3! 9: 1D 


Telteiy 76397 






ei ~ Ser Sin =O 


Fic, 1. Combinations of r,; and ;; leading to 
given test validities, when r., = .6. 


325 


the similarity was obscured by the fact that 
Horst assumed that all items were dichotomous 
and therefore introduced certain algebraic 
simplifications possible with the use of the 
point-biserial correlation coefficient. The meth- 
ods also differed in that Gulliksen scored all 
items in the direction making ri, rather than 
Yciy positive. Flanagan’s (1936) method is 
identical to Horst’s, with the exception that he 
recommended unity as the cutoff point for 
[9], so that items for which ro;/rie < 1 would 
be rejected. The Gleser-DuBois (1951) index 
is highly similar. 

If [9] is equal for two items, in Figure 1 
they will fall on a straight line passing through 
the origin. Within the upper right quadrant, 
the higher [9], the greater the slope of this 
line. A typical member of this family of curves 
is drawn in Figure 1. It is readily apparent 
that the agreement between these curves and 
the “ideal” set (illustrated by the recs4,) curves 
in Figure 1) is far from perfect. A much better 
approximation would be given by a series of 
straight lines parallel to the curve for ro; 
— fretit = 0. Such a set of curves would have 
the formula re; — revit = k, with the higher 
curves having higher values of k. Therefore, 
[2] can be used not only to determine the 
proper scoring direction for an item; it is also 
a highly accurate index of item usefulness. 

Drawings like Figure 1 were constructed for 
values of r-; other than .6, and it was found 
that [2] is always an accurate approximation 
to the “ideal” curves. 

By a similar procedure, it can be shown that 
[2] is highly similar to the indices proposed 
by all of the investigators listed in the in- 
troduction to this paper except Flanagan, 
Gulliksen, Horst (1936), and Gleser and 
DuBois, when these methods are modified by 
considering both directions of item scoring. 
The differences among [2 ] and the others seem 
to be small enough so that a choice among 
them can be made largely on the basis of com- 
putational convenience or other practical 
considerations. Index [2] is simpler than most 
of the other indices. However, among these 
indices, the index proposed by Thorndike 
(1949) has the advantage of having a known 
sampling distribution. Thorndike suggested 
using r-;.4, the partial correlation between the 
item and criterion holding the test constant. 
According to Hays (1963) or other standard 
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works on multiple regression, 7-;.; 1s also an 
exact index of the usefulness of i when multiple- 
regression weights are used. It therefore ap- 
pears that [2] is the index of choice when 
computational simplicity is a major considera- 
tion, and r-;.; is best otherwise. 


An Empirical Comparison of Indices 


Some data collected by Clark (1961, pp. 
31-37) demonstrate that there is enough differ- 
ence among indices of item usefulness to 
produce an observable difference in cross- 
validity of the resulting tests. Tests con- 
structed by the Gulliksen method, using index 
[9], were compared to tests constructed by 
what Clark called the “modified Gulliksen” 
method. The ‘‘modified Gulliksen’? method 
consisted of adding to the test constructed by 
the Gulliksen method several items with just- 
under-zero validities but with negative values 
of r;:. Cross-validation showed the ‘‘modified 
Gulliksen”? method to be superior to the 
Gulliksen method in the selection of naval 
aviation machinist’s mates; the Tilton over- 
lap measure for the former was .46, while for 
the latter it was .51. Comparing the two 
methods in the selection of electricians, again 
he found the modified Gulliksen method su- 
perior on cross-validation; the Tilton overlap 
measures for the two methods were .29 and .31. 

Clark’s ‘‘modified Gulliksen” method is more 
similar to the use of [2] than is the Gulliksen 
method. In Figure 1, the “modified Gulliksen” 
method selected, in addition to the items 
selected by the Gulliksen method, items just 
below the abscissa in the lower left quadrant. 
Index [2] or the others which are highly 
similar to it would have selected these same 
items. 

In both studies, Clark found a test con- 
structed ignoring interitem correlations to be 
markedly inferior to both other methods, with 
Tilton overlap cross-validities of .58 in the 
identification of machinist’s mates and .41 in 
the identification of electricians. 


A COMPARISON OF FouR TEST-CONSTRUCTION 
METHODS 


A study was done to compare the validity in the 
prediction of IQ of tests constructed by adding only 
one item per iteration with the validity of tests con- 
structed adding several items per iteration. In addition, 
both methods were compared to two methods of item 
analysis which ignore interitem correlations. 
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Subjects 


Robert Wirt made available to the investigators the 
responses of each of 227 mothers to a 600-item inventory 
concerning the behavior of one of her children between 
the ages of 6 and 16. He also made available IQ esti- 
mates for each of the 227 children. Of these children, 
115 had been diagnosed as retarded by a clinical 
psychologist or psychiatrist, and most were attending 
special schools for retarded children. In most cases, 
exact IQ data were not available except for a statement 
that the child’s IQ was below 70. The other 112 children 
were attending Minneapolis schools and had received 
a superior score on an IQ test, usually one of the group 
IQ tests. Most of these children had rated IQs between 
120 and 130. 


Procedure 


Forty of the retarded children and 40 superior 
children were set aside as a cross-validation sample, 
the remaining 147 being assigned to the test-construc- 
tion sample. 

In all test-construction methods, only unit weights 
were used. Validity was measured by the product- 
moment test-criterion correlation. Retarded children 
whose IQs were not known exactly were arbitrarily 
assigned an IQ of 60. 

The test-construction methods. Two experimental 
methods and two control methods were used. 


1. The “One item per iteration’ method selected 
first the most valid item in the item pool, then the one 
item which, according to its 7,;., value, would most im- 
prove this one-item test. (As mentioned above, fex.¢ 1S a 
partial correlation, and was proposed by Thorndike.) 
The method then selected the one item which woul¢ 
most improve the resulting two-item test, then the one 
which would most improve the resulting three-item 
test, and so on. This process was repeated until the mean 
value of 7?.;., for the unused items in the item pool 
dropped below 1/(N — 1), where WN is the number ol 
people in the test-construction sample. This value was 
chosen because it can be shown that this is the expected 
value of r%;., under the null hypothesis that rez.¢ is 
zero for all remaining items in the item pool. 

2. A test was also constructed by selecting the 1 
most valid items from the item pool. This technique was 
called the ‘“‘Control method-short test.” 

3. The “Control method-optimum length test”’ wa: 
similar to the “Control method-short test,” with the 
exception that the test length was chosen post hoe t 
maximize cross-validity, rather than being determinec 
in advance as was done with the “Control method-shor' 
test.” Starting with the most valid item, items wert 
added one at a time in the order of their validity anc 
the cross-validity of the resulting test was computed 
This was repeated until the test contained the 12 
most valid items. The optimum length of the 12 
lengths tested was then chosen. 

4. The test constructed by the “Control method 
optimum length test” was then used as a “first-stage” 
test for the “Several items per iteration” method 
Using the first-stage test as #, rez. was computed for al 
600 items in the item pool. Since the investigators hac 
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no idea how many items to add to the first-stage test 
to form a second-stage test, 42 items were added one 
at a time in the order of their 7,;.; values (without 
recomputing 7,;., values after the addition of each item), 
and the cross-validity of each of the resulting tests was 
calculated. Of these 42 tests, the one with the highest 
cross-validity was chosen as the second-stage test. This 
second-stage test was the final test produced; no third- 
stage test was constructed. 

| Removing items from the test. When an item is already 
in a test and is scored in a certain direction, then adding 
the same item scored in the opposite direction is 
equivalent to removing the original item from the 
test. Therefore, removing an item from a test can 
artificially be viewed as adding a new item which is the 
reverse of the item in the test. Whether such “new” 
items should be “added” can be decided by the same 
criterion used for determining whether any other new 
item should be added, that is, by the index of item 
usefulness associated with the item. This practice was 
adopted in the use of both iterative methods. 

Replication. Although several “replications” were 
performed, it was found practical to vary only one 
particular aspect of the test-construction situation, the 
item pool. This was done by first performing the com- 
parison in an item pool, then removing from that item 
p00l a number of items to form a new, smaller item 
9001. A number of items were then removed from this 
second pool to form a third, still smaller, item pool. 
Further item pools were formed in the same way. Al- 
though successive item pools were clearly not totally 
ndependent of each other, a large measure of inde- 
»endence was achieved because the items which were 
‘emoved were always the most useful or valid items, 
30 that the great majority of the items which entered 
he tests in one comparison between test-construction 
nethods were removed from the item pool for the next 
omparison. 

The “One item per iteration” method was compared 
vith the “Control method-short test” in 18 different 
tem pools. The first item pool consisted of the 600 
tems of the Children’s Personality Inventory. The 
econd item pool consisted of all the items in the first 
tera pool except those previously selected by the “One 
tem per iteration” method. The third item pool 
onsisted of all the items in the second item pool except 
hose selected by the “One item per iteration” method 
rom that pool. The fourth item pool consisted of all 
he items in the third item pool except those selected 
y the “One item per iteration” method from that pool. 
fhe remaining item pools were defined similarly, so 
hat no item was in more than one of the 18 different 
ests produced by the “One item per iteration” method. 

The “Several items per iteration” method was com- 
ared with the “Control method-optimum length test” 
1ethod in three different item pools. Because the large 
umber of highly valid items made the prediction too 
asy for a powerful comparison of methods, the first 
em pool was formed by removing the 72 most valid 
ems from the original 600-item pool. The second was 
mxmed by removing the 90 most valid items from the 
rst pool, and the third pool was formed by removing 
he 90 most valid items from the second pool. These 
umbers were chosen arbitrarily. 
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Results and Discussion 


The test lengths and validities of the tests 
produced by the four different methods are 
shown in Table 1. 

Examination of Table 1 shows that in every 
one of the 18 comparisons with the “Control 
method-short test,” the “One item per itera- 
tion” method was found inferior, usually by 
a substantial margin. This poor showing was 
probably caused in part by the extreme short- 
ness of the tests. Another probable cause is the 
fact that the validity of a preliminary test is 
used in the subsequent calculation of indices 
of item usefulness. Since the validity of this 
preliminary test is exaggerated in the test- 
construction sample, the index of item useful- 
ness is thereby distorted. This can result in a 
nonoptimal selection of items. Perhaps modi- 
fications of the method which eliminate these 
shortcomings would prove valuable. 

In contrast to this poor showing, in each of 
the three comparisons of the “Several items 
per iteration” method with the ‘Control 
method-optimum length test,” the former 
came out ahead. Since the “Control method- 
optimum length test” is by definition at least 
as good as the “Control method-short test,” 
it is clear that the present version of the ‘“‘One 
item per iteration” method is not very promis- 
ing, while the “Several items per iteration” 
method appears to merit wider use and further 
investigation. 

One should consider the possibility that the 
observed superiority of the “Several items per 
iteration” method was due partly or wholly to 
the fact that cross-validation sample data were 
used in choosing the number of items to add 
to the first-stage test to form the second- 
stage test. The hypothesis that the superiority 
was due wholly to this fact can be ruled out 
almost completely by the fact that the 42 
cross-validity figures from which each of the 
entries in the last column of Table 1 was 
selected were higher over a large range of 
items than the validity of the control method. 
In the first of the three comparisons between 
the two methods, if the investigators had 
picked a priori the number of items to add to 
the first-stage test, and had picked any number 
of items from 1 to 42, they would have found 
the experimental method superior to the 
control method. 
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TABLE 1 
Va.ipity or Tests CONSTRUCTED By FouR DIFFERENT TEST CONSTRUCTION TECHNIQUES, USING 
21 DrrrERENT ITEM POOLS ON A CROSS-VALIDATION SAMPLE OF 80 PEOPLE 
Control method One item Control method Several items per 
short test per iteration optimum length test iteration 
Number Number Number Number 
Item pool of items Validity of items Validity of items Validity of items Validity 

1 18 84 1 58 

2 18 .83 3 Me 

3 18 83 3 67 

4 18 82 3 58 

5 18 .80 3 JA 

6 18 aS 3 lie 

7 18 .69 4 .63 

8 18 .69 3 56 

9 18 Bd 4 50 
10 18 ail 5 20) 
11 18 Atk 4 42 
12 18 15 6 58 
13 18 .68 5 56 
14 18 .68 5 54 
15 18 65 5 46 
16 18 .64 6 .62 
Hid 18 59 5 40 
18 18 59 6 42 
19 59 65 86 74 
20 89 .60 112 62 
21 76 Al 86 AS 





The corresponding range for the second 
comparison between methods was 10-38 items; 
for the third comparison it was 1-23 items. 

Although we can thus reject the hypothesis 
that the observed superiority of the experi- 
mental method over the control method was 
due entirely to this artifact, it is nevertheless 
presumably true that the size of the superiority 
was exaggerated somewhat by the post hoc 
selection of test lengths; this was one of the 
reasons for the study reported in the next 
section. 


A FURTHER STUDY ON THE “SEVERAL ITEMS 
PER ITERATION” MrtHOD 


A further study was undertaken for the following 
reasons. 


It was deemed desirable to test the “Several items per 
iteration” method using an entirely different test- 
construction problem, with different subjects, a dif- 
ferent criterion dimension, and a different item pool. 

Due to the unusually large number of highly valid 
items in the item pool in the previous study, the 
“Several items per iteration” method had been studied 
only in item pools formed by artificially removing the 
best items from the original pool. A new study with a 
criterion more difficult to predict would enable an 


evaluation of the technique using an item pool in which 
items had not been selectively removed on the basis 
of validity. 

Only two iterations had been performed with the 
“Several items per iteration’? method (counting the 
original selection of the most valid items as the first 
iteration). The present study investigated the value of 
further iterations. ‘ 

The fact that test lengths had been chosen post hoe 
on the basis of data from.the cross-validation sample 
tended to exaggerate the test validities reported in the 
previous study. In the present study, test lengths were 
chosen without reference to cross-validation sample 
data for the second iteration, which was adequate to 
establish the superiority of the experimental method. 
In subsequent iterations, because the investigators had 
no guideposts to suggest the number of items to add at 
each iteration, they reverted to choosing these post hoc. 


Subjects and Background 


Albert Rosen very kindly made available to the in- 
vestigators the MMPI responses of 96 diagnosed schizo- 
phrenics and 250 nonschizophrenic psychiatric patients 
whom he called collectively “general abnormals.” 
These data had been used in an earlier study by Rosen 
(1958) in which he had constructed a 64item MMPI 
scale to separate the schizophrenics from the general 
abnormals. In this study, he had used 67 schizo- 
phrenics and 167 general abnormals as a test-construc- 
tion sample, while the remaining 112 patients were 


Test VALIDITY AND INTERITEM CORRELATIONS 


used for cross-validation. Using the sample described, 
Rosen had selected the 64 most differentiating items 
(those with differentiating power significantly different 
from 0 at the .05 level) and used them as a test which 
he labeled the Pz scale. He then tested several weighted 
values of the K scale for use in conjunction with the Pz 
scale, and concluded that maximum differentiation was 
achieved when the raw scores of the Pz and K scales 
were simply added, so that every item in both scales 
received a unit weight in the combined scale, which he 
labeled the Pz + 1K scale. 


Procedure and Results 


As in the previous study, only unit weights 
were used, and validity was measured by a 
product-moment correlation. In the present 
cross-validation sample, the 64-item Pz scale 
had a validity of .30 and the 94-item Pz + 1K 
had a validity of .35. The Pz + 1K scale was 
used as the standard to which the “Several 
items per iteration” method was compared, 
since requiring an improvement on this more 
valid test constituted a more severe test of the 
method. Accordingly, 7.;., was calculated for 
all items, using the Pz-+ /K scale as t. The 
IQ prediction study had suggested that in the 
second iteration the previous test should be 
lengthened by roughly 30% of its original 
length, so it was arbitrarily decided to add 25 
items to the 94-item Pz-+ 1K scale. (As ex- 
plained earlier, ‘‘addition” of an item can also 
refer to deletion of an item already in the test. 
Of the 25 items “added,” 24 turned out to be 
genuine additions and one a deletion.) The 
resulting 117-item second-stage test had a 
cross-validity of .40. The wiseness of the choice 
of the number of items added was checked by 
also adding items one at a time as in the pre- 
vious application of the “Several items per 
iteration” method. The highest cross-validity 
figure obtained in this way also rounded to .40; 
the test with this validity was 123 items in 
length. 

A third iteration was done by recalculating 
Yei-¢ for all items, using the 123-item second- 
stage test as /, and adding the most useful 
items. Choosing the optimum number of these 
items post hoc, a cross-validity of .43 was 
found after the addition of 13 items (which 
actually turned out to be the addition of 6 
items and removal of 7 items), resulting in a 
122-item third-stage test. A fourth iteration 
resulted in the addition of 11 items (which 
turned out to be the addition of 5 new items 
and the deletion of 6 old items), yielding a 
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121-item fourth-stage test with a cross-validity 
of .44, 


Discussion 


The improvement from .40 to .43 resulting 
from the third iteration did not appear to be 
due primarily to the post hoc selection of test 
lengths using cross-validation sample data, 
since it was observed that adding any number 
of items from 1 to 19 would have produced an 
improvement over .40. However, no further 
iterations were performed beyond the fourth 
because the validities associated with the 
different test lengths of the fourth-stage test 
did not show a consistent tendency to be 
higher than the figure of .43 observed for the 
third-stage test. It thus appeared highly prob- 
able that the improvement from .43 to .44 
observed with the fourth iteration was due 
almost entirely to the post hoc selection of test 
lengths in the fourth iteration. 

As more and more iterations are performed, 
the item weights should come to approximate 
the weights which would be found by calculat- 
ing a multiple-regression equation using all 
the items. This is particularly true if fractional 
item weights are used. Previous investigators 
in this field apparently believed that the only 
reason for stopping short of this point is the 
necessity for economizing in test construction 
and test scoring. Gulliksen (1950, p. 330) 
states explicitly that multiple regression is 
always the best linear weighting technique. 
However, Horst (1941, p. 360) and others have 
found that multiple-regression equations often 
do not perform nearly as well on cross-valida- 
tion as the standard methods of test-construc- 
tion which ignore interitem correlations, de- 
spite the fact that the original-sample validity 
is always higher for the multiple-regression 
equation. In addition, Lord (1950) and Nichol- 
son (1960) have shown mathematically that a 
large drop in validity from the original sample 
to the cross-validation sample is to be ex- 
pected when the number of variables in a 
multiple-regression equation is large relative 
to the number of people in the original sample, 
as is usually true when there are several 
hundred predictor variables as in the present 
case. These considerations imply that as more 
and more iterations are performed, cross- 
validity should eventually drop. This con- 
clusion agrees with the finding of the present 
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study that there is no real improvement in 
validity beyond a certain number of iterations; 
had the number of iterations been increased, 
a drop in cross-validity presumably would have 
been observed. 

There is reason to believe that the value of 
the ‘‘Several items per iteration” method in 
most test-construction projects should be 
somewhat greater than would be suggested by 
the amount of improvement observed over the 
.35 validity of the Pz + 1K scale. This is true 
because of the particular nature of the K scale 
of the MMPI. The K scale was designed as a 
suppressor scale. The purpose of a suppressor 
scale is to correlate negatively with the test 
it is intended to supplement while maintaining 
a near-zero correlation with the criterion. In 
other words, a successful suppressor scale is 
one which falls near the abscissa on the left 
side of Figure 1. It is thus apparent that a 
successful suppressor scale contains many of 
the same items as would be added to the 
original test in the second iteration of the 
“Several items per iteration” method. To some 
extent the K scale had “stolen the thunder” 
of the ‘‘Several items per iteration” method 
before the latter had a chance to operate. 
It is highly likely that approximately the same 
final validity would have been found using the 
“Several items per iteration” method even if 
the K scale had never been invented. Hence, 
comparing this validity of approximately .43 
with the .30 validity of the Pz scale should 
give a more realistic picture of the value of the 
method in situations in which there is no 
previously constructed suppressor scale. 


CONCLUSIONS 


Most, but not all, previously proposed 
indices of item usefulness are substantially 
accurate, as is the index 7.; — rei. In using 
any of these indices, it should be recognized 
that the optimum direction of item scoring is 
not necessarily that which results in positive 
validity for the item. 

Tests constructed by the ‘‘Several items per 
iteration” method are distinctly superior to 
those constructed by methods ignoring inter- 
item correlations, which in turn were better 
than tests constructed by the ‘‘One item per 
iteration” method used in the present study. 

In using the “Several items per iteration” 
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method, no more than three iterations were 
found useful. 
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COMPULSIVITY AS A MODERATOR VARIABLE: 
A REPLICATION AND EXTENSION 2 


LAWRENCE J. STRICKER 


Educational Testing Service 


This study replicated and extended earlier ones which found that 2 indirect 
measures of compulsivity (the Strong Accountant scale and a ratio score of 
reading speed to vocabulary) moderated the correlations of other Strong 
interest scales with grade-point average (GPA) for male engineering freshmen— 
the correlations were higher for the less compulsive students, In the present 
study, the 2 compulsivity variables did not moderate the correlations of the 
Strong scales with freshman-year GPA for liberal arts students of either Sex, 
although they did for men in the engineering program. The compulsivity 
variables were not significantly correlated, they did not moderate the same 
interest scales, and their joint use did not enhance the moderator effect. 


The notion that some variables affect or 
moderate (Saunders, 1956) the relationships 
among others has existed for a long time, but 
interest in these moderator variables has 
grown considerably during the last decade. 
This interest has been stimulated by studies, 
many of which have been reviewed by 
Ghiselli (1963), that demonstrate the ex- 
istence and importance of moderators in a 
variety of situations. 

Three of these studies have been con- 
cerned with the role of compulsivity as a 
moderator variable in the prediction of course 
grades by interest scales. The hypothesis 
underlying these studies was that the more 
compulsive students would put a great deal 
of effort into their course work, regardless 
of their interest in the courses, but the effort 
of the less compulsive students would depend 
upon their interest. It was presumed that the 


1Thanks are due Harold A. Korn of Stanford 
University for furnishing the raw data used in the 
study reported in this article, and Henrietta Gallagher 
for supervising the computations. 

Tables reporting the compulsivity variables’ cor- 
relations with the occupational interest scales, and 
all the occupational interest scales’ correlations with 
GPA and their variances, as well as the variances of 
GPA, in each of the various student groups and 
subgroups have been deposited with the American 
Documentation Institute. Order Document No. 8786 
from ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washington, 
D. C. 20540. Remit in advance $1.75 for micro- 
film or $2.50 for photocopies and make checks 
payable to: Chief, Photoduplication Service, Library 
of Congress. 
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amount of effort would be reflected in the 
grades that were obtained. Hence, a higher 
correlation between interest measures and 
grades was expected for the less compulsive 
students, Two indirect measures of compul- 
sivity were used: the Accountant scale of the 
Strong Vocational Interest Blank (SVIB; 
Strong, 1959)—those with high scores, re- 
sembling accountants in their interest, were 
presumed to be compulsive—and a ratio score 
based on the Speed of Comprehension and 
Vocabulary scores of the Cooperative Reading 
Comprehension Test:C2? (Educational Test- 
ing Service, 1951)—those reading more slowly 
than predicted from their vocabulary scores 
were presumed to be compulsive. The fresh- 
man grade-point average (GPA) of male 
engineering students was used as the criterion, 
In the initial study (Frederiksen & Melville, 
1954), in which Princeton University students 
were used, five of the 10 SVIB interest scales 
studied had significantly higher correlations 
with GPA for the students classified as less 
compulsive by the Accountant scale (i.e., 
below the median score), and three scales 
had significantly higher correlations for stu- 
dents classified as less compulsive by the 
reading measure (i.e., above the regression line 
of speed on vocabulary). In a replication 
(Frederiksen & Gilbert, 1960), again with 


* Educational Testing Service, Cooperative Test 
Division. The Cooperative Reading Comprehension 
Tests—Information Concerning Their Construction, 
Interpretation, and Use. Princeton, N. J.: Author, 
undated. 
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Princeton University students, one of the 
10 interest scales had significantly higher cor- 
relations for the students classified as less 
compulsive by the Accountant scale, and three 
scales had significantly higher correlations 
for the students classified as less compulsive 
by the reading measure. In both studies, there 
were indications that the joint use of the two 
compulsivity measures enhanced the moderator 
effect, Somewhat different procedures were 
used in a third study (Saunders, 1956), em- 
ploying University of Rochester students. 
The moderator properties of only the Account- 
ant scale were investigated in connection with 
four of the 10 previously used interest scales 
and three interest-group scales that resembled 
seven of the interest scales. A moderator effect 
was established if a score based on the product 
of the scores for the Accountant scale and an 
interest scale increased the multiple correla- 
tion with GPA of the Accountant scale and 
the interest scale alone. Significant effects 
were found on one of the original interest 
scales and one of the interest-group scales. 

The purpose of the present study was (a) 
to determine the generality of the findings 
from the previous studies, which were based 
on male engineering students, to liberal arts 
and science students of both sexes; and (0) 
to assess the effectiveness of using the two 
moderators jointly. 


METHOD 
Subjects and Variables 


The SVIB and the Cooperative Reading Compre- 
hension Test:C2 were administered at the beginning 
of the school year to the freshman class entering 
Stanford University in September 1960. The male 
form of the SVIB was scored for the Accountant 
scale and 50 other standard occupational interest 
scales, and the female version was scored for 27 of 
the occupational interest scales (there is no Ac- 
countant scale on the female version). Two-digit 
standard scores were obtained.2 The Cooperative 
English Test was scored for Speed of Comprehension 
and Vocabulary. Freshman-year GPA* and major 
field were later secured from school records. 


3 Answer sheets were not available for 47 women, 
so their standard scores were estimated from the 
letter ratings corresponding to scores that had been 
previously obtained. 

4 Letter grades were assigned, and they are quanti- 
fied in the following way: A=4, B=3, C=2, 
D=1, E and F=0. 
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The total class size was 849 men and 426 women, 
and analyses were based on the 743 men and 393 
women for whom complete data were available. 


Procedure 


The data for the 393 women, for 145 men with 
engineering majors, and for 598 men with liberal arts 
and science majors (including men who had not yet 
chosen a major) were analyzed separately. A 
reading residual score (Speed of Comprehension 
predicted from Vocabulary minus actual Speed of 
Comprehension) was computed separately for the 
two sexes.5 The three groups’ distributions on this 
score and the two male groups’ distributions on the 
Accountant scale were dichotomized at their medians 
to define the more compulsive and less compulsive 
groups. In addition, four subgroups of men with 
engineering majors and four of men with liberal 
arts majors were formed from the combination of 
the two dichotomized compulsivity measures. 

All the interest scales were analyzed. However, in 
each analysis (i.e, comparisons of the two ac- 
countant groups, of the two reading-measure groups, 
and of the four groups formed from the combination 
of the two compulsivity variables) the scales that 
were significantly (p< .05) correlated with GPA 
in one or more of the subgroups in that same 
analysis were kept separate from the scales not 
significantly correlated with GPA in any of these 
subgroups, for failure to find moderator effects on 
the latter scales may be due to their low correlations. 

The correlations between the two compulsivity 
measures (using the continuous scores) were com- 
puted in the two male groups, and the correlations 
between each interest scale and GPA were computed 
in each of the student groups and their various 
subgroups. Product-moment correlations were used 
throughout. The variances of these variables in each 
of the groups and subgroups were also computed. 


RESULTS 


The Accountant scale and the reading 
measure correlated —.04 (p> .05) in the 
engineering group and .03 (p> .05) in the 
male liberal arts group. 

For each interest scale in each of the three 
students groups, the significance of the differ- 
ence between its correlations with GPA in the 
two accountant groups and between its cor- 
relations in the two reading-measure groups 
was appraised by a one-tailed z test of the 
transformed correlations—ignoring the signs 
of the correlations, for the degree of relation- 

5 The regression coefficients for predicting Speed 
of Comprehension from Vocabulary were .91 for 
the men and 1.06 for the women. The coefficients of 
.84 for the engineering students and .92 for the 
liberal arts men were not significantly different 


(pb < .05). 
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ship in a subgroup, not its direction, was at 
issue. The scales with significantly (p < .05) 
higher correlations in the less compulsive 
groups identified by the Accountant scale are 
reported in Table 1, and those with signifi- 
cantly higher correlations in the less com- 
pulsive group defined by the reading measure 
appear in Table 2. 

All these significant differences between the 
accountant groups occurred on scales that 
were significantly correlated with GPA in one 
or both of these groups, and these significant 
differences for the reading-measure groups 
were also on scales that were significantly 
correlated with GPA in one or both of these 
groups. 

In the sample of engineering students, 25 
of the 50 interest scales correlated significantly 
with GPA in one or both of the accountant 
groups, and 27 correlated significantly with 
GPA in one or both of the reading-measure 
groups. Twelve of these scales had signifi- 
cantly higher correlations with GPA in the 
less compulsive group identified by the Ac- 
countant scale, and seven had significantly 
higher correlations in the less compulsive 
group identified by the reading measure. Of 


TABLE 1 


INTEREST ScALES THat Hap HiIGHER CORRELATIONS 
with GPA In THE LESS COMPULSIVE STUDENT 
Group DEFINED BY THE ACCOUNTANT SCALE 





Engineering Students 





More Less 
com- com- 
Scale Total pulsive pulsive z 
(QV'=145)" WN =84)"" Wi=61) 

Farmer —.25** .04 —.58** 3.62** 
CPA Owner .23** —.02 AST e 3.48** 
Veterinarian —.26** —.02 —.53** 3.31** 
Forest Service —.20* .00 — 44 2.74** 
Author-Journalist ald. .00 .A2** 2.60** 
Advertiser 14 —.04 39 2.16* 
Vocational- —.12 .07 —.41** 2.13* 

Agriculture 

Teacher 
Aviator —.16* .02 —.36** 2.08* 
Pharmacist —.16* —.02 —.3 57 2.01* 
Carpenter —.13 aid —.41** 1.90* 
Lawyer .09 —.16 44 1.81* 
Psychologist 24 ko -42** 1.73* 

Liberal Arts Men 
(N =598) (N=308) (N =290) 
None = —_ — _ 
*p> = .05. 
#*k b < O01. 
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TABLE 2 


INTEREST SCALES THAT Hap HiIGHER CORRELATIONS 
witH GPA IN THE LESS COMPULSIVE STUDENT 
Group DEFINED BY THE READING MEASURE 








Engineering Students 








More Less 
com- com- 
Scale Total pulsive pulsive Zz 
(N=145) (N=72) (N=73) 
Musician (Per- elviet —.04 .41** 2.33%* 
former) 
Psychologist .24** .08 42** 2.16* 
Architect m5) —.01 soot 1.96* 
Psychiatrist 15 -O1 .33** 1.96* 
Physicist .19* -02 .33** 1.90* 
Purchasing Agent —.12 -03 —.32** 1.78* 
Artist sid —.02 .30** edict 
Liberal Arts Men 
(VN =598) (N=276) (N =322) 
President —.09* -O1 —.17** 1.98* 
Musician (Per- .14°%** .05 .19** 1.73% 
former) 
Women 
(N =393) (N=187) (N =206) 
None = — —_— —_ 
ese O5s 
*K yD < 01. 


the 12 scales with significant differences be- 
tween the accountant groups, five were skilled 
trades (Group IV) scales, and three were 
verbal (Group X) scales. Four of the seven 
scales with significant differences between the 
reading-measure groups were from the profes- 
sional area (Group I). One scale (Psycholo- 
gist) was significant in both of these analyses. 

Among the liberal arts men, 30 of the 50 
scales correlated significantly with GPA in 
one or both of the accountant groups, and 28 
correlated significantly with GPA in one or 
both of the reading-measure groups. None of 
these scales had significantly higher correla- 
tions in the less compulsive group identified 
by the Accountant scale, but two had sig- 
nificantly higher correlations in the less com- 
pulsive group identified by the reading 
measure. 

For the women, nine of the 27 scales cor- 
related significantly with GPA in one or both 
of the reading-measure groups. None of these 
scales had significantly higher correlations 
in the less compulsive group. 

In addition, there were three instances, all 
involving scales significantly correlated with 
GPA in one or both reading-measure groups, 
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in which the difference between the correla- 
tions in these two groups exceeded the dif- 
ference required for significance by the one- 
tailed z test, but the difference was not in the 
expected direction—the correlation was higher 
in the more compulsive group. 

The heterogeneity of each scale’s correla- 
tions (ignoring signs of the correlations) in 
the four subgroups of engineering men and the 
four subgroups of liberal arts men formed 
from the joint use of the two compulsivity 
measures was appraised, separately for the 
two student groups, by a x? test of the null 
hypothesis that a set of correlation coefficients 
are from a common population (Snedecor, 
1956). Thirty-six scales in the engineering 
sample and 33 in the liberal arts sample had 
significant (p< .05) correlations with GPA 
in at least one of the four subgroups. None of 
the x? tests for any of the 50 scales in either 
group was significant (p < .05). 

In order to assess the possibility that the 
extent of the differences in correlations be- 
tween the various subgroups was due to cor- 
responding differences in the variances of the 
interest scales or GPA, the differences in 
the variances between the two groups defined 
by each compulsivity measure was assessed 
by two-tailed F tests, and the differences be- 
tween the four groups formed from the joint 
use of the two measures was appraised by 
Bartlett’s x? test. In each of these various 
analyses, fewer than 5% of the statistical 
tests were significant at the .05 level. 


DiIscuUSSION 


The most important finding was that the 
two compulsivity variables did not act as 
moderators for liberal arts students of either 
sex, although they did for men in the engineer- 
ing program. The moderator effects in the two 
male groups were not due to differences in 
variances of the compulsivity subgroups; there 
were few significant differences in variances 
and one subgroup had a larger variance for 
an interest scale about as often as another 
subgroup. The lack of moderator effects in 
the female sample, however, may have been 
caused by differences in subgroup variances. 
There were no significant differences in the 
variances of the interest scales or GPA in this 
sample, but the scales’ variances were gen- 
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erally larger in the more compulsive group de- 
fined by the reading measure. A difference 
in variances in this direction could mask the 
higher correlation for the less compulsive 
group that the moderator effect was expected 
to produce. 

The reason for the difference in results for 
the engineering and liberal arts men is not 
clear. The most likely explanation, and one 
that merits further investigation, is that there 
are important differences in the two groups’ 
modal personalities. It is already known that 
the two groups differ in a variety of person- 
ality characteristics (Blum, 1947; Goodman, 
1942; Harrison, Tomblen, & Jackson, 1955). 
The personality structure of liberal arts stu- 
dents may also be more complex, and the link 
between their interest or motivation and be- 
havior may be mediated by many variables 
in addition to compulsivity. 

Another important finding was that the 
interest scales affected by the moderator vari- 
ables in the sample of engineering students 
generally were not the same scales that dis- 
played such effects in previous studies—the 
sole exception was the Physicist scale, which 
yielded a significant difference between the 
reading-measure groups in the present study 
as well as in the Frederiksen and Gilbert 
(1960) study. Moreover, although most of the 
affected scales in the previous studies were 
from the technical-scientific (Group II) in- 
terest area, in the present study the finding 
involving the Physicist scale was the only 
instance in which a technical-scientific scale 
displayed a moderator effect. These dis- 
crepancies may reflect either sampling error 
or substantive differences, associated, perhaps, 
with variations in the courses taken by the 
students in the present study and those in 
the previous ones. The sampling error explana- 
tion is supported by the Frederiksen and 
Gilbert (1960) study, which found that the 
Accountant scale had a significant moderator 
effect on one of the five scales that displayed 
this effect in the original Frederiksen and 
Melville (1954) study, and the reading meas- 
ure had a significant moderator effect on two 
of the three scales displaying such effects in 
the original study. Rather indirect evidence 
that the present Stanford University sample 
and the Princeton University samples in the 
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previous studies took similar courses stems 
from the finding that the correlations of the 
10 interest scales with GPA for the total 
groups in the Frederiksen and Melville (1954) 
and Frederiksen and Gilbert (1960) studies 
were not significantly (p< .05) different 
from the correlations for the total engineering 
group in the present study. 

The assumption that the two compulsivity 
variables are measuring the same thing was 
brought into question by previous findings 
(Frederiksen & Gilbert, 1960; Frederiksen & 
Melville, 1954) that the two measures were 
not significantly (p< .05) correlated, and 
these findings were replicated in the present 
study with both groups of male students. 
This assumption was cast into further doubt 
by the major finding that the joint use of the 
two variables did not enhance the moderator 
effect, together with the related observation 
that the two measures not only moderated 
different scales but, in addition, the moderated 
scales reflected different interest areas. 

The lack of congruence between these two 
measures cannot be accounted for by un- 
reliability. The split-half reliability of the 
Accountant scale in a sample of college students 
is reported to be .84 (Strong, 1959), and the 
internal consistency reliability of the reading 
measure is estimated to be .46 on the basis 
of the reported reliability of .87 for the 
Vocabulary score and .86 for the Speed of 
Comprehension score, and the correlation of 
.75 between the two scores, all three statistics 
obtained from samples of high school and 
college students (see Footnote 2). The mean- 
ing of these measures clearly needs clarifica- 
tion. 

The findings of this study do not argue 
against the value of a moderator variable 
analysis, for ample evidence has accumulated 
that moderator variables, particularly when 
chosen for their theoretical relevance (Kogan 
& Wallach, 1964) rather than by purely 
empirical means (French, 1961), may clarify 
relationships among variables and improve 
prediction. The present study does raise a 
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caution about the stability and generality of 
such variables. As Ghiselli (1963) has noted, 
it may be as difficult to find moderator vari- 
ables that have generality as it is to find sup- 
pressor variables with that characteristic. 
And, more specifically, this study indicates 
that the use of the two presumed compulsivity 
measures in moderating the prediction of 
composite indexes of grades, such as GPA, 
from interests has limited practical value. 
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EDWARDS PERSONAL PREFERENCE SCHEDULE 
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Following the procedures of Goodstein and Heilbrun (1962), scores for 102 
males on the Edwards Personal Preference Schedule (EPPS), the Minnesota 
Scholastic Aptitude Test (MSAT), and 2 grade-point indices were analyzed 
for the entire sample and for low, middle, and high ability groups using 
partial correlation with MSAT scores held constant. The results showed little 
agreement with those reported by Goodstein and Heilbrun. The sample was 
also randomly divided into cross-validation groups and a similar analysis per- 
formed. These results were not stable. Finally, the possible moderating effects 


of intellectual ability were noted. 


It has been pointed out repeatedly that 
scholastic aptitude tests typically account for 
less than half the variance in predicting 
academic achievement. Thus, in recent years, 
much research interest has been directed 
toward finding additional predictors of college 
achievement, usually in the form of scores on 
nonintellectual scales—most frequently, per- 
sonality scales. The notion behind this re- 
search is appealing: If we cannot account 
for enough of the variance in prediction on 
the basis of intellectual ability alone, then 
let us search for additional predictors—per- 
haps in the form of stable personality dispo- 
sitions—which will enable us to enhance the 
predictive efficiency of our selection batteries. 
The search for these additional predictors has 
been carried out in terms of several research 
designs—designs which are ably and critically 
reviewed by R. L, Thorndike (1963) in his 
monograph on The Concepts of Over- and 
Underachievement. 

One frequently used design in the search 
for nonintellective predictors of college 
achievement involves correlating scores on 
each of the scales of a personality inventory 
with an index of college achievement while 
holding a third variable constant through the 
use of partial correlation. For example, a 
study reported by Goodstein and Heilbrun 
(1962) uses this design, but with an added 


1 Paper read at the convention of the Midwestern 
Psychological Association, Chicago, May 1965. 

2The author is grateful to M. D. Dunnette for 
his guidance throughout this research. 


twist. For their entire sample, they exam- 
ined the relationships between each of the 
scales of the Edwards Personal Preference 
Schedule (EPPS) and semester grade-point 
average with intellectual ability statistically 
partialed out. Then the added twist: Good- 
stein and Heilbrun divided their sample into 
low, middle, and high ability groups on the 
basis of intellectual ability, and proceeded to 
reanalyze their data for each of these rela- 
tively homogeneous ability groups. For each 
ability group separately, they examined the 
relationships between the Edwards scales and 
semester grade-point average with intellectual 
ability partialed out, found different corre- 
lates of achievement in each ability group, 
and concluded that levels of intellectual abil- 
ity are an important control or moderator 
variable for studying relationships between 
personality scale scores and college achieve- 
ment. They argued that when heterogeneous 
ability groups are studied and levels of ability 
are ignored as a variable, the true relation- 
ships between personality factors and achieve- 
ment may be concealed. They argued, in 
short, that intellectual ability acts in a moder- 
ating way to influence the magnitude of the 
relationships between personality factors and 
college achievement. 

Although levels of intellectual ability are 
probably an important moderating variable in 
the search for personality correlates of college 
achievement, it is possible that stratifying a 
sample on levels of intellectual ability will 
yield personality correlates which are of lim- 
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ited generality. The need for cross-validation 
and careful study of the generality of results 
is widely recognized, particularly in correla- 
tional designs which require grouping subjects 
(Ss) on the basis of test scores. Consequently, 
the present study is designed to check the 
stability and generality of Goodstein and 
Heilbrun’s results on a different sample of 
students, and to illustrate the problems 
arising when samples are rationally divided 
on the basis of intellectual ability in studies 
using this type of design. 


METHOD 


The Ss were 102 males in a large two-quarter 
introductory psychology class. Most of the Ss 
were sophomores, with some juniors and a few 
seniors. Predictors consisted of the EPPS and the 
Minnesota Scholastic Aptititude Test (MSAT). Cri- 
teria were two separate grade-point indices of college 
achievement. The EPPS (Edwards, 1959) contains 
225 forced-choice item pairs, all of which are 
equated for “social desirability” value. Responses are 
scored on 15 dimensions based on Murray’s system 
of manifest personality needs (e.g., n Ach, n Def, 
n Aff, etc.). The MSAT (Berdie, Layton, Swanson, 
Hagenah, & Merwin, 1962), developed as a short 
form of the Ohio State Psychological Examination, 
consists of reading passages followed by 78 same- 
opposite and analogies items which measure vocabu- 
lary and verbal comprehension. Transcripts of grades 
were obtained from the University Recorder and 
two indices of college achievement were computed 
for each S. The first index, quarter GPA, was com- 
puted for grades in courses completed during the 
immediately preceding quarter. The second index, 
core GPA, was computed for grades in liberal arts 
“core areas.” Core GPA was computed from grades in 
courses listed under foreign language, humanities, 
natural sciences, social sciences, and psychology, 
yielding about 25 credits spread over at least two 
quarters. Core GPA was chosen as an index of col- 
lege achievement in order to provide a sample of 
scholastic behavior approximately comparable for all 
Ss. Quarter GPA was chosen as an index of college 
achievement in order to give results more readily 
comparable with the criterion used by Goodstein and 
Heilbrun, a one-semester GPA. 

All variables were intercorrelated, and partial cor- 
relations were computed for each of the Edwards 
scales and the two grade-point indices with MSAT 
scores held constant. Partial correlations were com- 
puted for the entire sample and for three relatively 
homogeneous ability groups, to which Ss were as- 
signed on the basis of MSAT scores, thus providing 
an analysis similar to Goodstein and Heilbrun’s. 
Additional partial correlations were computed for 
three heterogeneous ability groups, to which Ss were 
assigned randomly, to shed light on the problems 
of grouping Ss. 
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RESULTS 


The results of this study for the entire 
sample and the three homogeneous ability 
groups, as well as those reported by Good- 
stein and Heilbrun for the comparable 
groups from their male sample, are shown 
in Table 1. Rather than compare the two 
sets of results at length, only a summary 
comparison will be made. Ten of 60 partial 
correlations reported by Goodstein and Heil- 
brun reach the 5% significance level. For 
the present study, 20 of 120 partial correla- 
tions reported in Table 1 reach the 5% sig- 
nificance level. But comparison of these two 
sets of results shows only three instances in 
which the same Edwards scale correlates sig- 
nificantly with scholastic success even though 
both studies found the same percentage of 
significant correlations. The Edwards Achieve- 
ment scale correlates significantly with se- 
mester GPA, Goodstein and Heilbrun’s cri- 
terion, and with quarter GPA for the 
total samples. The Endurance scale cor- 
relates significantly with semester GPA and 
quarter GPA for the middle ability groups. 
The Aggression scale correlates  signifi- 
cantly with semester GPA and core GPA 
in the high ability groups. In no case is 
any Edwards scale significantly related to 
all three criteria of scholastic success. This 
comparison of results shows the necessity for 
careful cross-validation followed by study of 
the generality of results, especially when the 
possible effects of moderator variables are 
being investigated. 

Further examination of Table 1 shows that 
13 of 60 partial correlations between the 
Edwards scales and quarter GPA reach the 
5% significance level, and that 7 of 60 partial 
correlations between the Edwards scales and 
core GPA reach the 5% level. But compari- 
son of the significant correlates for quarter 
GPA and core GPA shows only four instances 
in which the same Edwards scale correlates 
significantly with both criteria even though 
quarter GPA and core GPA are highly related 
(y = .72). This comparison calls attention to 
the need for careful definition of the opera- 
tions for measuring college achievement. It 
appears as though two different yet highly 
correlated criteria of college achievement 
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TABLE 1 


CORRELATIONS (wiTH MSAT ScorEs PARTIALED OUT) BETWEEN QUARTER GPA AND THE 15 EPPS ScALEs, 
AND CorE GPA AND THE 15 EPPS ScaALEs, AND THE COMPARABLE CORRELATIONS REPORTED BY 
GOODSTEIN AND HEILBRUN (G & H); ARRAYED FOR THE TOTAL SAMPLE AND EACH OF THE 
HomocGENeEous ABILITY GROUPS 





EPPS scale 
G&H Quarter GPA | Core GPA G&H Quarter GPA | Core GPA 
(N = 206) (N = 102) (NV = 102) (N = 68) (Nig=253) (N=353) 
Achievement 24 BD (a .09 18 EO Ons 00 
Deference 04 —.05 —.02 16 —.22 — .35* 
Order .06 is 220* a15 02 —.10 
Exhibition .03 —.14 —.11 01 —.08 —.40* 
Autonomy —.11 —.07 —.05 —.27* silts —.05 
Succorance —.06 — .38** —.16 .02 — .46** —.15 
Affiliation .06 04 —.01 —.11 —.05 elt 
Intraception —.04 — .29** —.08 .09 —.17 —.01 
Dominance .02 —.09 — .06 05 01 —.05 
Abasement —.02 .06 07 .08 —.11 —.14 
Nurturance —.16 — .38** 05 —.21* —.30 04 
Change —.15 —.09 —.05 —.14 —.29 —.19 
Endurance .09 poorer 205s 18 A4* C0" 
Heterosexuality —.02 — .06 —.13 —.15 —.17 —.09 
Aggression — 13 04 —.01 —.02 ey .20 
Middle ability group High ability group 
G&H Quarter GPA | Core GPA G&H Quarter GPA | Core GPA 
(N = 69) (NV = 36) (N = 36) (NV = 68) CVi=753) (NV = 33) 
Achievement .29* .09 .00 .20 .20 —.07 
Deference — .03 33 5 —.05 03 $17 
Order .10 aS 03 —.02 258"% 258% 
Exhibition .09 — 24 —.06 01 —.09 .00 
Autonomy 17 —.14 01 —.16 05 —.13 
Succorance —.12 —.07 —.12 eli —.28 —.33 
Affiliation — .26* lt 07 10 —.03 —.13 
Intraception — .25* —.20 —.11 ALS — .48** —.29 
Dominance .09 —.29 —.09 —.13 —.29 —.30 
Abasement —.16 14 05 Pld =.15 34 
Nurturance —.24* —.17 —.18 —.13 —.33 15 
Change —.21* 75 eid —.12 —.09 —.10 
Endurance A8** .39* 29 —.03 .09 18 
Heterosexuality 05 —.28 —.18 .05 —.07 —.16 
Aggression .00 —.04 —.09 —.22* —.34 —.37* 
*p <.05 
Hk < 01 








Total group 











Low ability group 





may yield different personality correlates of 
achievement. 

As a check on the stability of results 
yielded by the partial correlation analysis, 
the entire sample was randomly divided into 
heterogeneous ability groups, and for each 
of these groups partial correlations were com- 
puted between each of the Edwards scales 


and quarter GPA and core GPA with MSAT 
scores held constant. These partial correla- 
tions are shown in Table 2. Randomly di- 
viding the sample into thirds allows an 
internal kind of cross-validation: If these 
results have any stability, then the partial 
correlations between any Edwards scale and 
either of the grade-point criteria should be 
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TABLE 2 


CorRELATIONS (witH MSAT Scores PARTIALED OUT) 
BETWEEN Corre GPA AND THE 15 EPPS SCALES, AND 
BETWEEN QUARTER GPA AND THE 15 EPPS 
ScALES; ARRAYED FOR EACH OF THE 
HETEROGENEOUS ABILITY 
Groups (NV = 341N 
Eacu Group) 























Core GPA 
EPPS scale 

Group 1 Group 2 Group 3 
Achievement .00 .16 Bl) 
Deference 28 —.31 .02 
Order 18 03 .43* 
Exhibition —17 —.15 07 
Autonomy —.12 08 —.15 
Succorance —.31 —.13 04 
Affiliation —.16 —.30 — .06 
Intraception 28 AQ** —.23 
Dominance —.11 —.19 10 
Abasement 29 —.12 —.02 
Nurturance —.11 .26 —.07 
Change 02 atta — .36* 
Endurance 95 .26 Dl 
Heterosexuality —i1 —.10 —.21 
Aggression —.05 — .43* vi 

Quarter GPA 

Group 1 Group 2 Group 3 
Achievement Blo 29 BS lie 
Deference BY —.11 06 
Order O01 24 EoD 
Exhibition —.09 — .42* 07 
Autonomy — .04 05 —.24 
Succorance — .34* —.21 —.04 
Affiliation —.08 — Ag** — .04 
Intraception —.23 220) .02 
Dominance —.09 —.32 01 
Abasement aoe —.03 01 
Nurturance —.14 08 —.42* 
Change —.01 14 —.27 
Endurance aon 42* oS 
Heterosexuality —.15 —J11 —.28 
Aggression .06 —.22 19 

*p <.05 
FD < 01 


of approximately the same magnitude in each 
of the three cross-validation groups. Cor- 
relates which are significant in one group 
should be significant, or at least nearly sig- 
nificant, in the other two groups: the results 
should cross-validate. For the heterogeneous 
ability groups, 12 of 90 partial correlations 
reach the 5% significance level, which is very 
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nearly the same percentage as for the pre- 
vious analyses, but in no case is any Edwards 
scale significantly related to either of the 
grade-point criteria in more than one group. 
Furthermore, inspection of Table 2 shows 
very few cases where the partial correlations 
between any Edwards scale and either cri- 
terion are of approximately the same magni- 
tude across all three cross-validation groups. 
These results suggest that the observed per- 
sonality correlates of college achievement are 
strongly influenced by chance fluctuations 
from one sample to the next. While an in- 
crease in sample sizes would probably in- 
crease the number of statistically significant 
correlates, there is very little assurance that 
larger samples would enhance the predictive 
efficiency of the Edwards scales. 


DISCUSSION 


The results of this study, particularly 
when contrasted with those reported by 
Goodstein and Heilbrun, show that few, if 
any, firm comments can be made about the 
personality correlates of college achievement. 
In addition, the results lead to the following 
conclusions: 

1. The results reported by Goodstein and 
Heilbrun are of limited generality. Although 
this study was not designed solely as a cross- 
validation of their study, we must demand at 
least a modest degree of similarity between 
the two sets of results before placing much 
confidence in the generality of their findings. 

2. The cross-validation analysis of the 
heterogeneous ability groups yielded no in- 
stances in which any Edwards scale correlates 
significantly with either criterion of college 
achievement in more than one of the cross- 
validation groups. In every case significant 
results are specific to a particular group. Since 
Ss were randomly assigned to the cross- 
validation groups, the moderating effects of 
intellectual ability should have been the same 
in each group, and the results should have 
been stable across groups; the results ob- 
tained highlight the instability of the vari- 
ance under study, and show the necessity for 
cross-validation in studies designed to study 
the differing relationships in subgroups estab- 
lished on the basis of some presumed 
moderating variable. 
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3. Examination of the results for the 
homogeneous ability groups suggests that 
intellectual ability has potential as an impor- 
tant moderator variable in the analysis of 
the relationships between personality scales 
and college achievement. However, it is criti- 
cally important that subsequent studies in- 
corporate cross-validation designs and that 
they better account for the true complexities 
of the relationships between student person- 
alities, abilities, and college environments. 
Such studies may lead to the discovery of 
stable and general personality correlates 
which can be used to enhance the predictive 
efficiency of our college selection batteries. 


Mitton D. HAKkeEer 
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FACTOR ANALYSIS OF CATEGORY AND MAGNITUDE 


SCALES OF A TECHNICAL ATTRIBUTE * 


ARTHUR I. SIEGEL anp MARK G. PFEIFFER 
Applied Psychological Services, Wayne, Pennsylvania 


36 journeymen electronics maintenance personnel judged the complexity of 16 
avionics circuits using the paired-comparison, magnitude-estimation, rank- 
order, and constant-sum procedures on 2 occasions. The basic scale values were 
standardized across the 16 circuits separately within each method and occa- 
sion. The standardized scale values were then intercorrelated and factor ana- 
lyzed to test the hypothesis that 2 factors would account for the data. After 
considering 1-, 2-, 3-, and 4-factor solutions, a 2-factor solution was chosen 
as best fitting the data. These 2 factors suggested the taxonomy of “cognitive 
discrimination” and “contextual uncertainty” to account for Ss’ scaling 
behavior. Most frequently, the paired-comparison and the constant-sum 
methods were most heavily loaded on Factor 1, “cognitive discrimination.” 
The rank-order and the magnitude-estimation methods were most consistently 
loaded heaviest on “contextual uncertainty,” Factor 2. The findings are 
interpreted in terms of their relationship to certain customary scaling classifi- 


catory schemes. 


Extensive scaling literature on sensory 
phenomena, in the experimental tradition 
(Eisler, 1962, 1963; Ekman & Kuennapas, 
1963; Ekman & Sjoberg, 1965; Engen & Mc- 
Burney, 1964; Galanter & Messick, 1961; 
Helm, Messick, & Tucker, 1961; Perloe, 
1963; Stevens, 1957, 1958, 1961, 1962; Stev- 
ens & Galanter, 1957; Whitlock, 1963) sug- 
gests that scaling variables or procedures 
should likely group themselves into one or 
two classes which could be expressed in terms 
of the method employed (category versus 
magnitude) or on the basis of the continuum 
generated (prothetic versus metathetic). 
However, these studies have been involved 
with traditional scaling of sensory phenomena. 
A previous Applied Psychological Services’ 
study (Pfeiffer & Siegel, 1965) indicated that 
the continuum generated in scaling, of the 
type here investigated, was metathetic. The 
purpose of the current study was to investi- 
gate whether or not the category versus mag- 
nitude classificatory scheme would hold, given 
an underlying metathetic continuum, when 
the attribute to be scaled is a job-oriented 


1The present report was completed under Con- 
tract Nonr 2279(00) between Applied Psychological 
Services, Wayne, Pa., and the Personnel and Train- 
ing Branch, Psychological Sciences Division, Office 
of Naval Research. The authors are indebted to 
Glenn L. Bryan and James J. Regan for their advice 
and assistance. 


technical attribute, that is, perceived complex- 
ity of electronic circuits. 

Specifically, the present work investigated, 
through factor-analytic methods, the factorial 
structure of psychological scales (category 
and magnitude) of perceived circuit complex- 
ity. Thus, the present study focused on in- 
vestigating the basis of these complexity 
judgments. 

The variables studied grew out of the com- 
plexity judgments of avionics circuits made 
by journeymen electronics personnel using 
four scaling methods on each of two occa- 
sions. The scaling methods employed were: 
(a) rank order, (0) paired comparison, (c) 
magnitude estimation, and (d) constant sum. 
The rank-order and paired-comparison tech- 
niques may be considered category methods 
while the magnitude-estimation and the con- 
stant-sum techniques may be considered to be 
magnitude methods. 


METHOD 


The steps taken to achieve the purposes of the 
present study included: (1) development of a list of 
16 independent circuits (stimuli) whose complexity 
was to be judged, (2) having a group of journeymen 
electronics personnel judge the 16 circuits on two 
separate occasions using the four different scaling 
procedures, (3) obtaining for each method a scale 
value for each circuit for each subject (S), (4) stand- 
ardization of scale values across the 16 circuits sepa- 
rately within each method and occasion, (5) form- 
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ing a method-by-method intercorrelational matrix 
for each circuit (each correlation coefficient involved 
was based on the N = 36; since four methods and 
two occasions were involved, this matrix was 8 X 8), 
(6) separately factor analyzing each of the 16 
method-by-method matrices, each of which repre- 
sented a separate circuit. 

The scale standardization introduced merely set 
the four sets of judgments on a common scale and 
should do nothing to disturb the underlying fac- 
torial structure. Thurstone (1947, p. 67, pp. 368-369) 
supports this contention and goes on to say “the 
structure is retained in clearest form by the pro- 
cedure of normalizing the raw scores before factor- 
ing, and that is the recommended procedure [p. 
369].” Moreover, the reader should understand that 
correlation matrices based on intraindividual scale 
values or judgments were factor analyzed. 

Step 1 was accomplished in a previous study by 
Schultz and Siegel (1963). Their report described 
the results of a multidimensional scaling analysis of 
circuit types repaired by the Naval aviation elec- 
tronics technician. Sixteen independent (orthogonal) 
circuit types were identified. 

The second step was accomplished through a set 
of self-administering questionnaires. Included in each 
questionnaire set were four adaptations of the classi- 
cal methods. The instructions, employed by Ss who 
judged the complexity of the 16 avionics circuits, 
are shown in an Applied Psychological Services’ 
report by Siegel and Pfeiffer (1965). These four 
sets of instructions represent the rank-order, paired- 
comparison, magnitude-estimation, and _ constant- 
sum methods. 

Step 3 was accomplished through customary pro- 
cedures and Step 4 involved the methods suggested 
by Guilford (1954, 1956). 

The fifth step was accomplished by correlating, 
within each circuit, the resulting scale values of 
intraindividual judgments which had been derived 
for 36 Ss, across each of the four methods and 
across the two occasions of testing. Thus, there 
resulted an 8 X 8 intercorrelational matrix for each 
of the 16 circuit stimuli. 

Finally, the last step was accomplished by factor 
analysis. Each of these 16, 8 X 8 matrices, was sub- 
jected to factor analysis by the method of principal 
components. In each case the axes were rotated to 
orthogonal, simple structure, according to the nor- 
mal equamax criterion (Saunders, 1962). 


Stimuli 


The 16 circuits which formed the basis for the 
present analysis were: 


. Error compensating circuits, 

. Regulating circuits. 

. Information and coding circuits. 
. Voltage reducing circuits. 

. Frequency reducing circuits. 

. DC level establishing circuits. 

. Diode circuits. 

. Frequency selection circuits. 
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9. Balancing circuits. 

10. Matching circuits. 

11. Electron beam displacement circuits. 

12. Power supply circuits, 

13. Triggered circuits. 

14. High frequency wave 
circuits. 

15. Inverter circuits. 

16. Wave shaping circuits. 


component favoring 


Subjects 


The 36 Ss used in this study were all enlisted 
Naval aviation electronics maintenance personnel. 
On the average, Ss had been in the Navy for a 
total of 2.94 years and in their present electronics 
assignments for 1.66 years. All their levels were either 
striker or petty officer, third class. 


Questionnaire Design and Administration 


Since each S was serially exposed to each of the 
four scaling methods in one questionnaire packet, a 
possible order effect was controlled through a sys- 
tematic counterbalancing of the order of exposure to 
the sections. There was also the possibility of se- 
quential effects associated with the order of the 
stimuli within any one section of the questionnaire. 
This latter control problem was dealt with through 
the use of two different forms of the same question- 
naire, Form A and Form B. In general, the ar- 
rangement of the stimuli in the one form was the 
reverse of the arrangement in the other form. Each 
S took each form of the questionnaire. The two 
administrations were separated by a time interval of 
2 weeks. 

The four sections of the questionnaire, stapled 
together as a single packet in Form A, were admin- 
istered at one sitting to Ss in three separate groups 
over a period of 2 consecutive days. A group of 16 
Ss made their scaling judgments during the morning 
of the first day and a second group of nine Ss did 
so that afternoon. A third group of 11 Ss was given 
the questionnaire on the second day. Completion 
time for the entire questionnaire packet averaged 2 
hours. 

Since the questionnaire packets were essentially 
self-administering, only a brief general orientation 
was given to Ss at the beginning of the session. No 
major difficulties, possessing implications for con- 
founding, were encountered during the administra- 
tion. 

Form B of the questionnaire was administered 
to the same three groups of Ss, in the same manner 
as during phase one, 2 weeks later. 


Scale Transformation 


The raw data were first transformed into compa- 
rable distributions with the same scale. By a linear 
transformation procedure (Guilford, 1956, p. 493), 
the means and standard deviations of the magnitude 
data, derived from the magnitude-estimation and 
constant-sum methods, were set at approximately 50 
and 10, respectively. A different procedure was 
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employed for the category data derived from the 
rank-order and paired-comparison methods. The 
frequency values resulting from the paired-compari- 
son method were converted to ranks. Scale values 
for both category methods were then obtained from 
the ranks in accordance with the normalized rank 
method (Guilford, 1954, p. 181). Finally, the means 
and standard deviations of these distributions were 
also set at approximately 50 and 10, respectively. 


RESULTS 


Cronbach (1957) has referred to experi- 
mental versus correlational psychology and 
Kerlinger (1964) has suggested that the cor- 
relational factor-analytic approach may be 
useful for hypothesis testing if the investi- 
gator permits the possibility of different solu- 
tions in terms of the number of factors used to 
account for the data. In the present case, 1-, 
2-, 3-, and 4-factor solutions were superim- 
posed on the data and the solution selected 
as most appropriate which best met the cri- 
teria of: (a) relative consistency of rotated 
loading for methods across the 16 circuits 
within each factor solution, (b) meaningful- 
ness, and (c) parsimony. 

Since the hypothesis underlying the study 
was that a 1- or a 2-factor solution (category 
and magnitude) would best fit the data, this 
range of four different factor solutions was 
considered adequate. Accordingly 64 separate 
factor analyses and rotations were performed 
(4 solutions X 16 circuits = 64 analyses). 

Each solution started with the same zero- 
order correlational matrix. Intermediate steps 
in each analysis are not reproduced in the 
present report, due to the voluminous nature 
of the data. The final rotated factor loadings 
for the various solutions have been repro- 
duced in another context by. Siegel and 
Pfeiffer (1965), and the final rotated loadings 
for the 2-factor solution are presented in 
Table 1. Of the four factor-analytic solutions 
imposed on the data, the 2-factor solution 
seemed to meet best the criteria of meaning- 
fulness, parsimony, and consistency. 

After study, the 4-factor solution was re- 
jected as lacking meaningfulness and also as 
lacking parsimony. The 1-factor solution, 
while possessing consistency across methods, 
was lacking in apparent meaningfulness, at 
least to the present authors. 

The 3-factor solution seemed to lack con- 
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Occasion Method Factor 1 | Factor 2 
Electronic Circuit 1 

1 RO 1074 3742 

PE 5846 .6438 

ME 6743 1733 

cS .8140 3155 

2 RO — .0068 6044 

PC 4119 7961 

ME 4770 — .3126 

cS 7534 4176 

SS) 2.4354 1.9554 
Electronic Circuit 2 

1 RO 1189 .6506 

BE .6241 2019 

ME 4498 2240 

CS 8583 1795 

2 RO .1136 8197 

De 9094 0179 

ME 4988 1757 

CS .8060 —.0161 

Oh 3.0809 1.2497 
Electronic Circuit 3 

1 RO .1906 .6255 

RE .7964 3882 

ME 1756 .0269 

CS .8401 3410 

2 RO 1668 .8036 

RG .6900 .7019 

ME .6742 3168 

GS 6985 .5182 

SS 3.4242 2.1663 
Electronic Circuit 4 

1 RO .0499 .6592 

PE .6323 .2285 

ME .6240 3053 

CS 8143 —.1559 

2 RO 0815 .6714 

PE 1877 3238 

ME Zi 4645 

cS 8041 —.0090 

SS 3.0062 1.3757 
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TABLE 1—Continued TABLE 1—Continued 
Occasion | Method | Factor 1 Factor 2 Occasion Method | Factor 1 | Factor 2 
Electronic Circuit 5 Electronic Circuit 9 
1 RO 1335 7319 1 RO 1861 6804 
PG .8123 2097 RC .6079 4735 
ME 2765 1228 ME oS 0683 
eS 71576 1490 ES 1237 2968 
y) RO 3530 160 2 RO .2162 8258 
PE 7748 A189 RG 6951 3461 
ME 3823 3882 ME 5001 .2006 
CS 8917 3734 CS .7922 1986 
oS) 2.9941 1.6858 DS} 2.8473 1.6614 
Electronic Circuit 6 Electronic Circuit 10 
1 RO 1953 5359 1 RO 8427 —.1298 
PG .7307 2709 iC .7339 4895 
ME 3735 5502 ME 2300 .6508 
CS 8921 1394 CS .7828 .2510 
2 RO — .0003 5698 2 RO 5488 3267 
DG 8515 .2308 RC .7213 4928 
ME 1238 .6205 ME 0849 8178 | 
GS 8174. 0809 CS .6886 2802 
SS 2.9159 1.4522 SS SAG 1.8397 
Electronic Circuit 7 Electronic Circuit 11 
1 RO Aiea 2245 1 RO 3621 3356 
IAS 8091 2652 PE .8207 3958 
ME 5913 7780 ME 5193 6201 
Gs 25 3270 eS 8232 3805 
D, RO .6660 1189 2 RO .7168 .1988 
AG .8809 3430 PC ~ 7638 5688 
ME 1285 .6849 ME .2200 .8124 
CS .8042 3491 CS .6709 .6174 
SS 4.0490 1.5556 IS 3.3476 2.2028 
Electronic Circuit 8 Electronic Circuit 12 
1 RO 2569 7358 1 RO .1770 7014 
PC 4472 8114 Pe 7459 5295 
ME .7669 .2484 ME .7558 2179 
@s 5699 .6610 GS . .9687 1356 
2 RO .6236 4626 2 RO 2277 -6848 
PC .6518 5791 BC .7364 aval 
ME .6008 3235 ME 5841 3207 
CS .6310 5681 CS 8827 3051 
SS 2.7518 2.6751 DS) 3.8118 1.7754 
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Occasion Method | Factor 1 Factor 2 
Electronic Circuit 13 

1 RO —.0516 .7499 

Re 8086 .1070 

ME 8471 — 2284 

eS .8026 .0679 

2, RO 4325 pila 

PG .7604- 2469 

ME .6279 1483 

CS .8218 4345 

ay 3.8532 1.1657 
Electronic Circuit 14 

1 RO 1851 .7665 

Be eile) 3888 

ME 7448 0249 

(Cs) 8579 2064 

2 RO 1504 .6567 

PC 7492 3068 

ME .6357 2228 

CS 9228 2255 

RYO) 3.6711 1.4078 
Electronic Circuit 15 

1 RO —.2097 0152 

RG 7447 3663 

ME 2334 AIRS} 

CS .8167 3902 

2 RO 4325 .0523 

BG .7968 4588 

ME 0881 8471 

GS .7679 3878 

iS) 2.7394 1.8795 
Electronic Circuit 16 

1 RO 3147 .4326 

PG .8969 3355 

ME 7835 2410 

(SS .8395 3624 

2 RO .1937 .6978 

RG .7118 4914 

ME A524 3679 

es gid 5327 

Si) 3.4826 1.6366 








sistency across circuits. Specifically, when the 
final rotated factor loadings for the stimuli 
(methods) in this solution were studied, it 
was noted that relative consistency was lack- 
ing as one moved from one circuit to the 
next. Other things being equal, a degree of 
congruency for methods, from circuit to cir- 
cuit, would be preferred for any final solution 
which is said to possess generality. 


Factor 1 


Within the 2-factor solution, a considerable 
degree of across analysis consistency was evi- 
dent. The paired-comparison and the con- 
stant-sum methods typically possessed the 
highest loadings on Factor 1. There is also an 
overwhelming tendency for the rank-order 
method to be loaded lowest on Factor 1. 
Moreover Table 1 suggests that for 16 out 
of the 16 stimuli, either the constant-sum or 
the paired-comparison method possessed the 
highest loadings for Factor 1 on one of the 
occasions of judgment. Conversely, the rank- 
order method possessed the lowest rotated 
loadings on at least one occasion, within this 
analysis, for 14 out of the 16 analyses. 


Factor 2 


The rank-order method was most heavily 
loaded on Factor 2 for 9 out of the 16 circuit 
stimuli. The magnitude-estimation method 
possessed the heaviest loading on Factor 2 in 
five of the remaining seven analyses; the 
paired-comparison method possessed the high- 
est rotated factor loading in only two of 
these analyses. ; 


DISCUSSION 


Extensive scaling literature cited earlier in 
this report as well as a recent literature review 
by Ekman and Sjoberg (1965) indicated that 
the scaling variables should group themselves 
into one or two classes which might be ex- 
pressed in terms of the scaling method em- 
ployed (category versus magnitude). 

Four different factorial possibilities were 
considered from the standpoints of consist- 
ency, meaningfulness, and parsimony. On the 
basis of the eigen values, it might have been 
possible to argue in favor of a 2- or a 3-factor 
solution. However, the 2-factor solution 
seemed to be interpretively most clear, cer- 
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tainly most rational, and most consistent. A 
complete solution to the problem of hypothe- 
sis testing through factor analysis may lie in 
the prespecification of the number of factors 
expected from the analysis and also in the 
prespecification of their positions in Euclidean 
space. The former was done in the present 
study but not the latter. 

If the 2-factor solution is tenable for the 
psychological judgment involved in scaling 
the complexity of the electronic circuits, it 
does not seem as if the solution easily fits 
the established classificatory scheme based on 
the type of method involved. If the category 
methods (rank-order and paired-comparison) 
had been found highly loaded in one factor 
and the magnitude methods highly loaded in 
the second factor, then the anticipated classi- 
ficatory scheme might have been accepted in 
the area here investigated. Or, if one or both 
of the emergent factors were bipolar, with 
the bipolarity in the direction suggested by 
category-magnitude dichotomy, the hypoth- 
esized classificatory scheme might have been 
considered to be acceptable. 

On the basis of the present analysis, it 
seems that, at least for electronic circuits, and 
possibly in other psychological scaling areas, 
a classificatory scheme based on the psycho- 
logical requirements imposed by the scaling 
task on the judge may possess a degree of 
appropriateness. 

The constant-sum (magnitude) and paired- 
comparison (category) methods were typi- 
cally loaded on Factor 1. On the other hand, 
the rotated factor loading of the traditional 
rank-order method was typically high on 
Factor 2. It will be noted that in both the 
paired-comparison and the constant-sum 
methods, the stimuli are considered two at a 
time by the Ss. On the other hand, the rank- 
ing method may involve a degree of percep- 
tual ambiguity because of the almost simul- 
taneous presence of all stimuli. The magni- 
tude-estimation method, as employed, may 
also possess this characteristic. Thurstone has 
previously suggested that when comparative 
judgments are made, “discriminal” processes 
are involved. On the other hand, in categorial 
judgments the category boundaries of both 
the stimuli and the subjective values are in- 
volved. If one accepts that the constant-sum 
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and the paired-comparison methods both 
largely involve discriminating between the 
two stimuli involved in a pair, we then find 
some support for Thurstone’s thinking in 
Factor 1 of the present data. We interpret 
Factor 1 as reflecting the less difficult cog- 
nitive-perceptual task involved in discriminat- 
ing or differentiating the complexity of the 
avionics circuits, 

This interpretation suggests “cognitive- 
discrimination” as an appropriate name for 
Factor 1. The word “discrimination” seems 
preferred, over words such as “differentiation” 
or “assimilation,” to accentuate the relative 
ease of perceptual judgments, as well as the 
cognitive ease, for the S in these methods. 

Factor 2, on which the ranking and magni- 
tude-estimation methods were _ typically 
highly loaded, seems, from the stimulus point- 
of-view, to be different from the other meth- 
ods on the basis of the number of stimuli 
within the ganzfeld at a given time. All stim- 
uli were simultaneously presented for the 
rank-order and magnitude-estimation meth- 
ods but not for the other methods. From the 
cognitive point-of-view, these methods prob- 
ably involve boundary establishment and 
maintenance for subjective means, discrim- 
inal dispersions, etc. Accordingly, it appears 
as if “contextual uncertainty” is an appro- 
priate name for this factor. This name em- 
phasizes the structure of the stimulus field, 
as well as the thinking behavior of the judge 
as inherent in the simultaneous presence and 
possible confusion of the 16 circuit stimuli. 

Moreover, it is believed that this name 
emphasizes the active role of the S who makes 
the judgment. It considers the S to be more 
than a passive transducer, but rather, as sug- 
gested by Warren and Warren (1963), a dy- 
namic organism whose central nervous system 
and past experience influence his behavior 
and characteristic response. 

From the factorial purity point-of-view, 
the present findings tend to support the use 
of the constant-sum and rank-order methods, 
over the magnitude-estimation and paired- 
comparison methods, as employed, for psy- 
chological scaling of the type here considered. 
It seems as if the judges may have vacillated 
over and differentially emphasized discrimi- 
nation, boundary maintenance, and percep- 
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tual organizational aspects when employing 
these latter methods. 

The present study has suggested an alterna- 
tive taxonomic system for the classification 
of scaling procedures within the present area 
of interest. This taxonomy is not intended by 
the authors as a replacement for any existing 
system, but possibly as a supplement thereto. 
It seems that in this field, psychological scal- 
ing may be classified by not only using such 
traditional criteria as the method employed 
(category versus magnitude), or the sensory 
continuum generated (prothetic versus meta- 
thetic), but also on the basis of the cogni- 
tive-perceptual behavior of the S when ex- 
posed to the different scaling methodologies. 
Since the underlying sensory continuum stud- 
ied here was metathetic (Pfeiffer & Siegel, 
1965), the proposed taxonomy of “cognitive 
discrimination” and ‘contextual uncertainty” 
may not be generalizable to prothetic con- 
tinua. 
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TEST OF MECHANICAL PRINCIPLES AS A SUPPRESSOR 
VARIABLE FOR THE PREDICTION OF EFFECTIVENESS 


ON A MECHANICAL REPAIR JOB 


WAYNE W. SORENSON 1 
Eastman Kodak Company, Rochester, New York 


Psychological measures for prediction of job effectiveness in a skilled mechan- 
ical repair job were assessed using the usual validation cross-validation para- 
digm. 63 industrial mechanical repairmen doing similar work were studied. 
The most predictive and operational selection system was determined through 
use of multiple-regression analysis. Test administration was simplified using a 
suppressor variable to obtain acceptable predictive efficiency with the smallest 
number of predictors. The prediction system employed a school achievement 
type of mechanical comprehension test to improve criterion prediction by 
suppressing noncriterion variance in another mechanical aptitude test. The 
suppressor relationship between the 2 tests supported a job description that 
specified mechanical knowledge and experience rather than knowledge of 


formally tanght mechanical principles. 


A suppressor variable is defined as 


4 variable in a prediction battery that correlates 
zero with the criterion but highly with another 
predictor in the battery. It has the effect of sub- 
tracting from the predictor variable that part of its 
variance that does not correlate with the criterion, 
and hence increases the predictive value of the 
battery [English & English, 1958, p. 537]. 


A hypothetical example is cited by English 
and English to illustrate application of two 
variables with a suppressor relationship in 
selection: 


E.g., although “shop mathematics” has a high r 
with the criterion of work success, in selection it 
lets through some poor workers who are merely 
good in mathematics. A general mathematics test, 
which has a low 7 with the criterion and a high r 
with shop mathematics, can be so negatively 
weighted in combination with the shop math test 
that those who are merely good in mathematics will 
not have a good enough combined score to be 
selected. The general mathematics test is a suppressor 
variable [p. 537]. 


The concept of suppressor variables for 
amplifying multivariate predictive efficiency 
in psychological measurement was introduced 
by Horst (1941). Meehl (1945) offered an 
algebraic explanation and development of 
the concept and McNemar (1945) attempted 


1The assistance of Lane H. Riland with the 
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with the programed multiple-regression analyses used 
in this study. 


a rational explanation in terms of common 
and independent elements of predictors and 
criterion. Lubin (1957) further explored the 
suppressor variable concept and developed 
additional formulae to assess the contribu- 
tions and limits of suppressor variables to 
prediction systems. The rationale behind the 
suppressor variable is straightforward and 
would appear to have considerable value in 
applied selection situations. Despite the ap- 
peal of improved predictive accuracy offered 
by this concept, applications of suppressor 
variables have not been common. As Johnson 
(1960) comments, “it appears quite difficult 
to find workable suppressor variables for 
which psychological rationales existed prior to 
empirical investigation [p. 126].” Meehl 
(1945, p. 553) also called attention to the 
importance of nonstatistical considerations in 
the search for suppressor variables. The 
reason psychological rationales rarely exist 
prior to empirical investigation is probably 
best explained by the fact that the suppressor 
model is conceptually more complex compared 
to the usual multivariate prediction model. 
The rarity of suppressors in general, even 
without prior psychological rationale, could 
be a result of several influences. Explorations 
among the relationships and interrelationships 
of large numbers of predictor variables were 
very difficult prior to the advent of large- 
scale regression programs. Even though 
regression analyses are now executed rapidly 
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and routinely among many variables by com- 
puters, there may still be tendencies to select 
the most conceptually simple prediction equa- 
tions. Or, there may be reluctance to develop 
test batteries that include one predictor that 
merely “suppresses” another variable without 
directly accounting for any criterion variance 
by itself. Whatever the reasons, suppressor 
variables are not common occurrences in 
reported prediction research. 

A goal of applied selection research is often 
to develop economical predictor batteries 
(i.e., predictors which employ a minimum 
number of operationally simple and easy-to- 
administer tests). In situations where useful 
levels of prediction and economics of testing 
are important, a suppressor variable, if dis- 
covered, may provide the optimal solution. 

_The research described here illustrates the 
; ‘utility of such a suppressor variable relation- 
ship for selection of men for the job of skilled 
industrial mechanic, 


METHOD 


Subjects 


The subjects (Ss) for this study consisted of 63 
men employed as industrial mechanics. Their ages 
ranged from early 20s to late 50s. The mechanics 
were responsible for repair and maintenance of 
several types of very complex semiautomated ma- 
chines. Continuous production operations demanded 
rapid repair of breakdowns with minimal inter- 
ruption of productive machine time. Some machine 
features were electronically controlled, but the me- 
chanics investigated in this study were responsible 
for nonelectronic repair and maintenance only. The 
need for electronic knowledge was satisfied by the 
ability to identify breakdowns as mechanical or 
nonmechanical. 

Newly hired mechanics spent approximately 1 year 
of guided on-the-job training before becoming fully 
qualified mechanics. Because of the complexity of 
the machines and the long training period, ap- 
proximately one out of every two men who were 
hired for the job actually qualified as a mechanic. 

The 63 Ss represented most of the men employed 
as mechanics in the division in which this study 
was undertaken. All Ss cooperated in the study on a 
volunteer basis with the guarantee that all indi- 
vidual test results would be confidential. Individual 
test scores were available only to the researchers 
and to individual Ss and no member of supervision 
saw specific test results. All of the men who were 
asked participated in the project. 

The sample was randomly divided into two groups 
for validation purposes. Forty-three mechanics 
comprised the development sample and the remaining 
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20 mechanics were held out for cross-validation of 
the results. 


Criterion 


The job of mechanic was a complex trouble- 
shooter type of job for which no direct objective 
measure of effectiveness could be obtained. It was 
felt that members of direct supervision had suf- 
ficient familiarity with mechanics to evaluate their 
job performance. Evaluations of each mechanic were 
made independently by two members of the indi- 
vidual mechanic’s first level of supervision. Rank- 
order correlations indicated near perfect agreement 
among supervisors (median rho coefficient = .98). 

Evaluations by first-level supervisors were ob- 
tained in two ways. Individual mechanics were 
ranked on a global job-performance basis using a 
modified alternation ranking procedure. They were 
also rated on 25 job-related Likert-type items which 
were obtained from a critical incident interview with 
members of supervision and selected mechanics. 
These were scaled to provide maximum discrimina- 
tion among mechanics. 

The intercorrelation between the overall global 
ranking and the rating scores for mechanics was 
high (r=.81) which suggested that the overall 
ranking and the ratings on specific aspects of job 
performance were reflecting essentially similar per- 
ceptions of job behavior. Therefore, the rankings and 
the ratings were combined into one composite cri- 
terion measure. The distributions of both the rank- 
ings and the ratings were transformed into standard- 
ized scores with equal means and equal variances. 
The resulting criterion measure (rankings plus 
ratings) had a mean equal to 1,000 and a standard 
deviation equal to 100. Criterion scores of all 63 
mechanics were reviewed with members of higher 
level supervision to insure that the composite cri- 
terion measure was not contaminated by tenure, 
salary grades, or other nonperformance factors. The 
composite criterion measure was compared to age 
of the mechanics. A low and nonsignificant cor- 
relation (r=—.21) suggested that age was not 
influencing the criterion. 

In summary, the criterion was a single continuous 
measure of job performance based on perceptions 
of job performance by immediate supervisors. 


Predictors 


The following measures were assessed as potential 
predictors of effectiveness: Survey of Space Rela- 
tions Ability (Case & Ruch, 1944); Survey of 
Mechanical Insight (Miller, 1955); Test of Me- 
chanical Comprehension (Bennett & Fry, 1941); 
Thurstone Test of Mental Alertness (Thurstone & 
Thurstone, 1952); Kuder Preference Record—Voca- 
tional, Form CH (Kuder, 1948); Edwards Personal 
Preference Schedule (Edwards, 1953); Background 
Survey Questionnaire (a short series of questions 
relating to the backgrounds and attitudes of the 
mechanics developed specifically for this study). 
Thirty-four predictor scores were obtained by em- 
ploying scales from these measures. 
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Regression Analysis 


Test-criterion relationships were determined in the 
development sample (N = 43). Multiple regression 
was used to assess the relationships of predictor vari- 
ables to the criterion variable. Five regression 
analyses were made in order to develop an optimal 
practical prediction system. 

Two considerations influenced selection of pre- 
dictors for inclusion in the final prediction system: 
an R2 value was sought that (a) would be high 
enough to allow practical prediction, and (b) would 
also employ the smallest number of easy-to- 
administer tests. The first regression analysis was 
made using all 34 potential predictor variables. 
Inspection of the regression coefficients suggested 
that 14 of the variables accounted for sufficient 
criterion variance to eliminate the remaining 20 
variables. As shown in Table 1, this resulted in a 
reduced development R? coefficient with an in- 
creased cross-validation R? coefficient and the reten- 
tion of five of the seven original measures. Thus, 
two measures were eliminated on statistical bases 
alone. 

Pragmatic features of the prediction system were 


TABLE 1 


CORRELATIONS BETWEEN PREDICTORS AND CRITERION 
FOR FrvE COMBINATIONS OF PREDICTOR VARIABLES 














Sample 
Variable combination Develop- Cross- 
valida- 
ment en 
(NV = 43) (W = 20) 
All measures (34 variables)* 92 O01 
Best from 5 measures (14 vari- 74 34 
ables)» 
Survey of Mechanical Insight, eZ 26 
Background Survey Question- 
naire, & Kuder Preference 
Record—Vocational (7 vari- 
ables)» 
Survey of Mechanical Insight, 57 64 
Background Survey Question- 
naire, & Edwards Personal 
Preference Schedule (9 vari- 
ables) 
Survey of Mechanical Insight, A4 57 


Background Survey Question- 
naire, & Test of Mechanical 
Comprehension BB (3 vari- 
ables) 





® Measures: Survey of Space Relations Ability (1 variable), 
Survey of Mechanical Insight (1 variable), Test of Mechanical 
Comprehension (1 variable), Test of Mental Alertness (3 
variables), Kuder Preference Record—Vocational (11 vari- 
ables), Edwards Personal Preference Schedule (16 variables), 
Background Survey Questionnaire (1 variable). 

b Only the most predictive variables were used from any 
measure or set of measures. 
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considered to be as important as statistical efficiencies 
in this study because the final prediction battery 
was to be administered as a standard selection aid 
in a centralized personnel department. Such con- 
siderations as the amount of testing time, the avail- 
ability and sophistication of personnel, and the costs 
involved were given equal emphasis with statistical 
factors. Therefore, three additional regression analy- 
ses Were made using various groupings of measures 
from the best 14 predictor variables. These analyses 
were determined by grouping measures on the basis 
of their probable use in selection. All five regression 
analyses are illustrated in Table 1. 


RESULTS 


The results of the five regression analyses 
are shown in Table 1 which illustrates that 
stability of prediction tended to increase as 
the numbers of predictors decreased. In two 
of the analyses the cross-validation sample 
correlations were higher than the development 
sample correlations. Inspection of the scatter- 
plots suggested that the lower development 
sample correlations were caused by one 
extreme “off quadrant’ individual. 

Two of the groupings of predictor measures 
indicated sufficient cross-validated predictive 
relationship to be useful for selection. The 
best grouping, in terms of cross-validated 
prediction, included the Edwards Personal 
Preference Schedule. This grouping would 
have been chosen if predictive efficiency had 
been the sole criterion for development of 
a selection battery. However, the grouping 
that employed the Biographical Survey Ques- 
tionnaire in place of the Edwards was chosen 
because it had greater face validity and would 
be easier to administer and interpret on a 
routine basis by a personnel department. 

The regression equation for the grouping 
that was chosen and put into application 
was: 


VY, = 866 + 10X; — 6X2+ 17X3 


where: Y. = predicted criterion score, X; = 
Survey of Mechanical Insight score, X2 = Test 
of Mechanical Comprehension score, X3 = 
Background Survey Questionnaire score. 
Table 2 shows the intercorrelations among 
the three predictors and their correlation with 
the criterion. The Test of Mechanical Com- 
prehension acted as a suppressor variable as 
described by English and English (1958, p. 
537). It had a near zero correlation with the 


PREDICTION OF EFFECTIVENESS IN MECHANICAL REPAIR 


TABLE 2 


INTERCORRELATION OF THE PREDICTORS AND THEIR 
CORRELATIONS WITH THE CRITERION 








c |SMI|/TMC|BSQ 





Criterion (C) — | .22 |—.04} .30 

Survey of Mechanical Insight — 1} 09 
(SMI) 

Test of Mechanical Compre- —7) |\—=.02 


hension (TMC) 
Background Survey Question- — 
naire (BSQ) 





Note.—Only the correlations of the three predictors chosen 
for operational use are presented. 


criterion and a relatively high correlation 
with another predictor in the battery resulting 
in a negative regression coefficient in the 
prediction equation. 


DISCUSSION 


Analysis of the mechanic’s job at the 
inception of the validation study suggested 
that mechanics were characterized as pos- 
sessing considerable mechanical aptitude, 
skill, experience, and interest as well as vary- 
ing degrees of interpersonal competence in 
dealing with regular operators of the ma- 
chines, General intelligence and spatial rela- 
tions were also hypothesized as factors impor- 
tant to the job; however, the subsequent 
validational analyses did not warrant inclusion 
of these measures in the final prediction 
system. 

The better mechanics appeared to rely on 
mechanical intuition and experience rather 
than formally taught principles (e.g., physics, 
mechanics, electricity, etc.). Although many 
good mechanics had experience in machine- 
shop work, they seemed to fit a picture of 
“born” mechanics rather than “made” me- 
chanics. Discussions with members of super- 
vision supported this thesis as among some 
of the men who had failed to qualify for the 
job were some who had looked the best 
“on paper.” 

It was expected at the start of the study 
that this “born” rather than “made” factor 
would be manifested in the background ques- 
tionnaire in terms of educational background, 
actual mechanical experience, etc. To a lim- 
ited extent this was substantiated as some of 
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the items on the Background Survey Ques- 
tionnaire were of this type. However, the 
principal control for the formally taught 
(textbook) type of mechanical knowledge was 
borne by the Test of Mechanical Compre- 
hension. Inspection of the two mechanical 
aptitude tests reveals that their item contents 
are quite dissimilar. The Survey of Mechani- 
cal Insight is almost exclusively a “nuts and 
bolts” type of test; that is, all of its 35 
items are composed of drawings of noncom- 
plex mechanical apparatus. Knowledge of 
mechnical principles could assist in achieving 
a high score, although lack of such knowl- 
edge would not preclude a high score. 

By contrast, the Test of Mechanical Com- 
prehension, Form BB, is not limited to simple 
mechanical apparatus types of items. At first 
glance, its 60 items (simple drawings) ap- 
pear similar to those of the Survey of Me- 
chanical Insight. However, further inspection 
reveals that these items are directed toward 
understanding of rudimentary physics and 
not just mechanical knowledge and experi- 
ence. Items included are concerned with force 
vectors, geometry, the properties of sound, 
air currents, fluids, gravity, electricity, etc., 
as well as some which deal with simple 
mechanical apparatus. It would thus seem 
to be more of an achievement test of ele- 
mentary physics than a test of mechanical 
knowledge and experience. This is not to say 
that the two tests are independent of each 
other as indicated by their intercorrelation 
ofe7T, 

In this study, the ability to achieve a high 
score on the more “nuts and bolts” oriented 
Survey of Mechanical Insight without benefit 
of a high score on the more school- 
achievement oriented Test of Mechanical 
Comprehension was associated with success 
on the job of industrial mechanic. 

Within a large set of predictor variables 
the occurrence of a suppressor relationship 
should not be especially unusual. Any con- 
figuration of correlations similar to those il- 
lustrated in Table 2 would result in one 
variable suppressing another in the prediction 
of the criterion. The unique qualities of this 
prediction study were that the suppressor 
variable relationship was incorporated in a 
practical selection system and that it allowed 
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selection of a small number of predictors 
which conformed to prior job analysis per- 
ceptions about the job. That is to say, it was 
a workable suppressor relationship for which 
a psychological rationale existed prior to 
empirical investigation. 
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SHORT-TERM MEMORY FACTOR IN THE DESIGN OF 
DATA-ENTRY KEYBOARDS: 


AN INTERFACE BETWEEN SHORT-TERM MEMORY 
AND S-R COMPATIBILITY 


R. CONRAD 
Medical Research Council, Applied Psychology Research Unit, Cambridge, England 


An experiment on immediate recall of 8-digit sequence was carried out. Mode 
of recall was via a data-entry keyboard. 2 keyboard layouts were used, 1 of 
high, 1 of low compatibility. The low-compatibility keyboard required more 
time for entry and gave more errors. These extra errors were identified as 
being primarily memory rather than aiming errors. The results are discussed 
in terms of an interface between short-term memory and S-R compatibility ; 
they are held to support a memory model involving a limited-capacity channel, 
and a practical design conclusion is suggested. 


The increasing use of numeral data-entry 
keyboards suggests the question of whether 
or not the effect of keyboard layout is limited 
to the rate at which data can be entered; and 
if keying accuracy is also affected, whether 
errors can be assumed to be mainly aiming 
errors. 

Data-entry keyboards are not only be- 
coming increasingly used, but commonly used 
designs are appearing of considerable vari- 
ability. Push-button telephones will replace 
the familiar dial phone and no layout, accept- 
able internationally, has so far emerged 
(Deininger, 1960; Oden, 1961). When a 
standard layout for telephones is agreed 
upon, it is most unlikely to be that already 
adopted for simple add-listing machines. This 
discrepancy in itself will pose a difficult 
problem for the designer of numeral data- 
entry keyboards for specialized purposes, 
since their users will be quite likely to use 
at the same time both telephone and add- 
listing keyboards. But this alarm begs the 
question of whether in fact key layout is a 
relevant performance variable. 


1 The author gratefully acknowledges facilities and 
cooperation provided by the General Post Office and 
the Union of Post Office Workers. The testing was 
carried out by D. J. A. Longman. 


It has been recognized for some time (Fitts 
& Seeger, 1953) that compatibility between 
items in a stimulus set and specific responses 
to those items can affect performance in an 
information-processing task, of which class 
numeral data-entry by keyboard is one ex- 
ample. There are a priori grounds therefore 
for assuming that one keyboard may be more 
efficient than another. Although this question 
has been discussed by Oden (1961) and by 
Broberg (1963), only Deininger (1960) has 
supported with experimental evidence his 
conclusion that over a fairly wide range, 
variation of key layout can be tolerated with 
little effect on speed or accuracy of entry. 
That this conclusion is apparently incon- 
sistent with the generality of S-R compati- 
bility studies, is probably because Deininger 
worked within a range of S-R relations all 
of which could be considered to be of high 
compatibility; a constraint dictated by the 
practical nature of his investigation. 

The first prediction of this article then, is 
the simple one that if a low enough level of 
S-R_ compatibility can be achieved in a 
numeral data-entry keyboard, entry (keying) 
rate will be lower than with a keyboard of 
high S-R compatibility, and accuracy will be 
impaired. 
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Fic. 1. Diagrammatic representation of the layouts 
of (1a) high-compatibility and (1b) low-compati- 
bility keyboards. 


If one thinks of a task in which numeral 
sequences are memorized prior to entering 
them into the keyboard, and in practical 
situations there is no way of obviating a 
short-term memory stage, if the stimulus 
conditions are held constant, it is reasonable 
to suppose that if an error difference were 
found between two levels of S-R compatibil- 
ity, it would largely be due to differences in 
the ease with which appropriate keys could 
be located, that is, the memory component 
would be invariant. However, Conrad (1958) 
presented some results which suggested that 
the quicker a memorized numeral sequence 
was reported the fewer recall errors were 
made. Mackworth (1962) confirmed this for 
the case where report was either unpaced 
verbal or paced button pressing. A second 
prediction can thus be made that if S-R com- 
patibility differences lead to differences in 
entry rate, error differences will also be found 
attributable to memory failure. The experi- 
ment to be reported is therefore concerned 
with an interface between S-R compatibility 
and short-term memory. 


METHOD 


Apparatus. Two keyboards were available which 
were identical in all respects other than the labeling 
of the keys. The two arrangements, which can be 
designated High Compatibility (C) and Low Com- 
patibility are shown diagrammatically in Figure 1. 
A display unit consisting of an aperture just large 
enough for an 8-digit number to be seen, presented 
typed numeral sequences, no more than one being 
visible at any time. A sequence was automatically 
presented when the subject (S) lifted a telephone 
handset. This operation also started a timer and 
3 seconds later the sequence was replaced by a blank. 
The S then keyed in the sequence from memory 
and replaced the telephone. A millisecond printing 
recorder/timer printed out a record of keys pressed 


R. CoNRAD 


in order of operation, and the cumulative time 
between successive keystrokes. When the (clearly 
audible) printing operation was over, S, at her 
leisure, lifted the telephone handset and so brought 
on the next sequence. 

Material. A test consisted of the recall and keying 
of 32 8-digit sequences. All digits 0-9 were used 
“equi-frequently” and randomly, with the restraints 
that no digit occurred more than twice in succes- 
sion (e.g., 222), and not more than three digits 
were repeated in any one sequence. Two suck 32- 
sequence lists were prepared. 

Design. Twenty female telephonists, who profes- 
sionally used a dial telephone, were given one test 
on each of 2 successive days. The High C and 
Low C keyboards were alternated and counter- 
balanced across lists. Thus four groups each 
of five Ss were tested in one of the following 
pairs of conditions: H1 L2; H2 L1; Li H2; L2 Hi. 

Instructions. The Ss were told simply to try to 
recall and key in the sequence in the order pre- 
sented, and not to start keying until the display 
changed to the blank aperture. A minor instruction 
reminded them to replace the telephone at the end 
of keying. 


RESULTS 


Two performance criteria were used. 1. 
Number of wrong digits; this means any 
digit wrong for the particular serial position. 
2. Time interval between first and last key- 
stroke. The mean number of wrong digits per 
sequence was: High C 1.18; Low C 1.45 
(f= 2.50; p< .025, two-tail). The mean 
keying time per sequence was High C 6.28 
seconds; Low C 7.14 seconds. A simple sign 
test showed that 19 out of the 20 Ss were 
faster on the High C keyboard. 

For each S there is a High C-Low C dif- 
ference score for errors-and keying time. A 
product-moment correlation between these 
two difference measures gives a value of .75 
(p < .001). 

These results show a substantial association 
between keying time and errors but give no 
indication as to whether errors depend on 
time or vice versa. The effect of compatibil- 
ity on keying time is in accord with expecta- 
tions from all the work on S-R compatibility. 
But why are there also more errors? The 
simple explanation is that the increase in 
errors is due to poor aiming, that is, they 
are effector not memory errors. 

There are two sources of data which can 
be used to give some indication of the origin 
of errors. One is the specific nature of errors 
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themselves which can be derived from the 
error matrices of the two keyboards. For 
High C and Low C, respectively, these are 
given in Tables 1 and 2. It is clear by inspec- 
tion that in neither case are errors random. If 
errors were primarily aiming errors one would 
expect very little relationship between the two 
_ matrices, since rarely, for instance, would the 
most probable near miss for a particular digit 
be the same on both keyboards. When a 
rank-order (Spearman) correlation between 
the cell values in the matrices was run, a 
tho of 52 (= 5.68; p< .001) was ob- 
tained. Thus it would appear that error rela- 
tions depend significantly on the particular 
digit, and this is what would be expected 
from a matrix of short-term memory errors 
(Conrad, 1964). 

More direct evidence against an aiming 
hypothesis was found by rank correlating 
_ errors from the two keyboards by location 
rather than by digit. For example, referring 
to Figure 1, the rank order of the frequency 
of the Error 4 instead of 1 (High C) was 
compared with the Error 9 instead of 5 
(Low C); 8 given 5 (High C) with 6 given 0 
(Low C); 9 given 2 (High C) with 1 given 2 
(Low C), and so on for all errors. In this 
- case the rank correlation was .03. One can 
be fairly sure therefore that aiming errors 
contribute little to the total error score. 

The final analysis was concerned with the 
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ERROR MAtrrx FoR Low- 
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general possibility that only the extra errors 
found for the Low C keyboard over those 
for the High C were aiming errors. If this 
were the case one would expect them to be 
equally distributed among the eight different 
positions in the sequence, that is, S’s aim 
should be neither better nor worse at the 
beginning of the sequence than at the end. 
Table 3 gives the errors including omissions 
according to serial position, and also a hypo- 
thetical set of values representing the ex- 
pected errors for the Low C keyboard if the 
extra errors were equally distributed. 
Comparing the observed High C and Low 
C distributions, the chi-squared value is 6.56, 
p > .30. If the extra errors had been equally 
distributed, the chi-squared value would have 
been 17.10, p < .02. Thus it appears highly 
unlikely that the effect of reducing the S-R 
compatibility is merely to increase the number 
of aiming errors. The distribution of errors by 
serial position in both cases points strongly 
to a memory rather than to an effector origin. 


Discussion 


When a numeral data-entry keyboard has 
a key layout which has poor compatibility 
with normal expectations of the location of 
the digits, the results presented suggest first 
that S requires more time to locate each key, 
and second that this is associated with an 
increase in recall errors for material held in 
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TABLE 3 
SERIAL DISTRIBUTION OF ERRORS 
High C 6 21 41 53 104 155 194 96 


Low C (Observed) iA ah Se EO wey 
Low C (Hypothetical) |27 42 62 74 126 
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a memory store during keying. Thus the 
results agree with predictions derived from 
S-R compatibility theory. The results also 
confirm those of Conrad (1958) and Mack- 
worth (1962) that mode of report in a test 
of immediate memory may significantly deter- 
mine the level of recall error. 

The results support the short-term memory 
model outlined by Broadbent (1963) who 
suggested that “what matters in short-term 
memory might well be the time for which 
interfering activity occupies the limited ca- 
pacity system rather than the similarity 
between the interfering and original activities 
[p. 36].” On this model, recall of an item 
(and this would in this context include 
its response selection, report, sensory feed- 
back, and monitoring of the response) would 
prevent “rehearsal” of items still in store. 

Posner and Rossman (1965) have in- 
deed shown that even when a retention 
interval between end of presentation and 
beginning of recall is kept constant, recall 
performance depends markedly on how S is 
occupied during this interval. Putting it 
crudely, the “harder” he has to work at an 
interpolated task, the worse will be recall. 
It seems reasonable to stretch the concept 
of a retention interval, to the end of the re- 
called response. In this case, without a formal 
interpolated retention interval, S’s mode of re- 
sponse is likely to affect recall. It is therefore 
quite possible that even if incompatible re- 
sponse did not lead to an increase in time to 
report recall, recall would just the same suf- 
fer. Mackworth (1962) has in fact reported 
that paced verbal report gives better recall 
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than keyboard report paced at the same rate. 
But in the present experiment the demon- 
strated relationship between report time and 
recall errors, suggests that report time is itself 
an important factor. This is supported by 
Conrad and Hille (1958) who showed that 
for a constant presentation rate and written 
report, slowing report rate worsened recall. 

It is evident, then, that in the design of 
sensorimotor tasks such as data entry by 
keyboard, high S-R compatibility must be 
achieved not only to increase entry rate, but 
also to minimize short-term recall errors. The 
importance of standardizing layout of com- 
monly used numeral keyboards cannot be 
overemphasized. 
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BACKGROUND FACTORS IN AIRLINE MECHANICS’ 
WORK MOTIVATIONS: 


A RESEARCH NOTE?* 


TRIPIT NARAYAN SINGH 
University of Bhagalpur 


AND 


HOWARD BAUMGARTEL 


University of Kansas 


A correlational analysis of a number of questionnaire items assessing the 
importance of various aspects of the work situation showed 2 themes: 
1 referred primarily to needs for advancement and the other to needs for 
security and stability in job and interpersonal relations. Level of educational 
achievement bears a positive relationship with advancement motivation. Age 
is, independently, negatively related to advancement needs. Trends exist to 
indicate converse relationships between education and age and the need for 


security and stability. 


The study of worker motivations has come 
to be a focal concern of psychologists in- 
volved in industrial research. Disillusion with 
satisfaction measures as significant correlates 
of worker productivity has been partly re- 
sponsible for this shift of emphasis. The 
recognition of limitation in the predictive 
power of selection devices, particularly those 
measuring skills and abilities, has stimulated 
research workers to seek for those situational 
factors in work structure and organizational 
climate that may bear significantly upon 
worker motivation and, consequently, upon 
performance measures. The drift away from 
studies of nonsupervisory production workers 
toward more study of factors influencing the 
performance of engineers, accountants, super- 
visors, and managers has further reinforced 
the research concern with motivation. Fried- 
lander (1966) has discussed the development 
of worker-motivation research in the context 
of reporting a study designed to validate 
measures of work motivation against per- 
formance data. 

The study reported in this paper was aimed 
toward investigating the contributions of 
three background factors—age, length of 
service, and amount of education—to the 
kinds of work-motivation measures often 
used in industrial research. Managerial 


1 This paper reports one of a number of analyses 
of data collected by the second author during the 
period 1956-58. Funds for the original research in- 
vestigation were supplied by the General Research 
Fund of the University of Kansas. 


strategies for motive arousal and reward 
structure should be adapted to the predomi- 
nant motivational pattern to particular groups 
of workers. Research workers must exercise 
caution in the interpretation of motivational 
measures so reflective of factors external to 
the work setting. And care must be exercised 
in stating broad generalizations about what 
“motivates” American industrial workers. 


METHOD 
Sample 


The data upon which this study is based were 
gathered as a part of a 3-year longitudinal study 
of the effects of a plant relocation and techno- 
logical reorganization upon the attitudes, motiva- 
tions, and organizational relationships of a group 
of aviation mechanics in a large commercial air- 
craft overhaul base. Questionnaire surveys were con- 
ducted among selected shops and work groups in 
1956, 1957, and 1958. Data from the 1958 question- 
naire covering 340 nonsupervisory employees were 
used for the analysis reported here. By 1958 all 
employees had been relocated in the new overhaul 
base. 

The overhaul base was situated in a rolling coun- 
tryside about 15 miles from a major midwestern 
city. Both aircraft engines, then of the reciprocating 
type, and airframes were overhauled periodically at 
the base. Employees participating in the study were 
mechanics and workers in the six shops: plating, 
propeller, test cells, sheet metal, hydraulics, and 
fabrics. The employees included in this study con- 
stitute less than 20% of the total work force at 
the base.? 


2For further description of the setting and study 
see H. Baumgartel and G. Goldstein (1961). 
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Measures 


Information on the mechanics’ age, length of 
service, and level of educational achievement was 
obtained from the questionnaire. The questionnaires 
were administered to the employees on company 
time under carefully supervised conditions. Extensive 
discussions had been held with both management 
and union officials to clarify the purposes of the 
research and to obtain permission for worker 
cooperation. 

The two measures of worker motivation were 
developed empirically by constructing indexes based 
on an inspection of the intercorrelation among 11 
motivational items scattered throughout the body of 
the questionnaire. Of the 11 original items 9 were 
used. The 9 items were of the “importance” type. 
Workers were asked to rate the importance of the 
various job factors on 5-point scales from “utmost 
importance” to “not at all important.” 

Inspection of this intercorrelation matrix revealed 
that the items seemed to fall into two natural 
groups; each group of items were intercorrelated 
among themselves and uncorrelated with items in 
the other item subgroup. The two indexes used in 
the analysis reported below were constructed from 
the two sets of intercorrelated items. 

One set of items dealt with: learning new skills, 
getting a better nonsupervisory job, getting a super- 
visory job, and making more money. The average 
intercorrelation among these items was .23. The 
average correlation with items in the second index 
was —.01. The index based on these four items used 
was called the “Advancement Index” referring as it 
does to the importance of getting ahead in the 
company setting. 

The second set of items dealt with: continuing 
with one’s work group, having a steady job, con- 
tinuing in one’s present job, meeting company stand- 
ards, and doing an outstanding job for the com- 
pany. The average intercorrelation among these items 
was .29. The average correlation between these five 
items and items in the first index was —.01. This 
second index was labeled the “Stability Index” 
referring as it does to security and the maintenance 
of existing relationships with jobs and people. 


Design of the Data Analysis 


The relationship of age, length of service, and 
amount of education to each of the motivational 
indexes was obtained for the 340 mechanics. Scores 
on the 4-item index were corrected to become com- 
parable with scores on the 5-item index by multi- 
plying each score by 5 and dividing by 4 so that 
both sets of scores ranged from 5 to 25. 

The basic pattern of the analysis was to compare 
the distribution of motivation scores for each of 
the three background variables while controlling 
empirically on the others. Mean scores were com- 
puted; however, chi-square tests were applied to 
assess the statistical significance of the various 
distributions in the controlled analyses. 
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SUMMARY OF RESULTS? 


The results of the investigation of the rela- 
tionships between age, length of service, and 
amount of education and each of the two 
motivational indexes is presented below. Sta- 
tistical significance figures, where indicated, 
are based on chi-square computations as 
mentioned above. 


Advancement Motivation 


The age of the mechanics in this sample 
bears a consistent and significant negative 
relationship with scores on the advancement 
index. The relationship maintains its pattern 
when level of education is controlled except 
that the pattern for the “did not complete 
high school” group does not reach the .05 
level of confidence. The consistent negative 
relationship between age and advancement 
scores is maintained when length of service 
is held constant. 

The level of formal education achieved by 
the mechanic is positively and significantly 
related to his score on the advancement index 
(p < .01). The relationship is sustained when 
controlling separately on age and length of 
service. Hence, regardless of the age or 
length of service of the mechanic, the more 
education he has had the higher are his 
advancement aspirations within the work 
context. 

A mechanic’s length of service bears no 
statistically significant relationship to his 
advancement motivation as measured by this 
index. 


Stability Motivation 


The stability index is a composite of desire 
for security, stability in job assignment and 
social relations, and conformity to company 
demands. There are no statistically significant 
relationships between age, length of service, 
or amount of education and workers’ scores 
on this index. There are, however, consistent 
but nonsignificant trends indicating that, con- 
trolling on education, older workers attach 
more significance to security and _ stability 
and that concomitantly, regardless of age, 
workers with more education attach less sig- 


3For a detailed presentation of the findings see 
ee singhent 96s). 
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nificance to this factor than workers with 
little formal education. Length of service per 
se shows no consistent pattern of relationship 
with stability motivation. 

Among these 340 mechanics, concern with 
stability increases with age (ms) and con- 
cern with advancement decreases with age 
(p < .05). The two need levels intersect in 
the 30-40 age bracket. Similarly, the stability 
needs decreased with the level of educational 
achievement (ms) and advancement needs 
increased with educational level (p < .01). 


DISCUSSION 


The evidence from this study supports the 
notion that life span and formal education 
are significant determinants of the salience 
of various job-related motivations. As a man 
gets older the importance he attaches to 
getting ahead in the company job-structure 
declines. Age itself rather than length of 
service seems to be the critical influence. 
Similarly, the level of formal education 
achieved during a man’s youth induces a 
persevering effect upon his desire to get 
ahead. Some years ago, Mann (1953) demon- 
strated that, controlling on job and wage 
levels, the more education an employee had 
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the less satisfied he was with his pay and 
promotional opportunities. Inasmuch as satis- 
faction is a function of both need level and 
environmental return, this study confirms 
Mann’s findings by the use of direct measures 
of worker motivation. Since the results of this 
study of both advancement and _ stability 
motivation confirm common sense expecta- 
tions, one can view these findings as a crude 
validation of the measurement techniques 
themselves. Friedlander’s (1966) findings 
would indicate, however, that relationships 
between such measures and performance are 
neither simple nor obvious. 
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A STUDY OF COMPUTER-ASSISTED INSTRUCTION 


IN INDUSTRIAL TRAINING 


H. A. SCHWARTZ anv R. J. HASKELL, Jr.1 
IBM Corporation, Poughkeepsie 


The study was undertaken to test the feasibility of remote computer-assisted 
instruction as an industrial training technique. 79 newly hired electronic 
technicians received their required training in basic data-processing principles 
through programmed texts, the standard method used for this presentation. 
25 equivalent students received the same training through a keyboard- 
operated terminal device linked remotely to an IBM 1440 computer system. 
No significant differences in examination scores were obtained; however, 
there was a significant saving (approximately 10%) in the time required 
to complete the course. On an attitude questionnaire administered subsequent 
to the courses, both groups rated their respective method of instruction as 
approximately equal to regular classroom techniques in terms of effectiveness 


and desirability. 


Considerable interest has recently been 
shown in the use of computers as teaching 
devices. A number of empirical and philo- 
sophical articles have appeared and numerous 
courses have been prepared. These have been 
reviewed and discussed by Gentile (1965) 
and Zinn (1965). 

The results of these explorations have 
amply demonstrated that it is possible to 
teach by computer. The question remaining 
is: “Is it desirable to do so?” Stated dif- 
ferently: “What advantages are to be gained 
through the use of computer-assisted instruc- 
tion rather than other presently available 
media?” 

Unfortunately, the answers to this ques- 
tion cannot be given without regard to the 
specific application contemplated. Obviously, 
much of the overall justification for com- 
puter-assisted instruction, as well as for the 
particular type of system employed, will 
depend on whether one is interested in teach- 
ing kindergarten students, industrial em- 
ployees, inmates of penal institutions, etc. 

With few exceptions, however, research in 
the area of computer-assisted instruction has 
been performed with college, secondary, or 
elementary students, employing suitable sub- 
ject matter. There appears to have been a 
dearth of research activity in the area of 


1The authors are greatly indebted to L. R. 
O’Neal, who supervised the operation of the CAI 
system and also authored both versions of the 
course material employed in the investigation. 


computer-assisted instruction in industry, de- 
spite the potential value of this technique in 
industrial training. (See Long & Schwartz, 
1966.) 

Within IBM, the investigation of industrial 
training via computer-assisted instruction 
was begun during the latter portion of 1964. 
The Advanced Maintenance Development 
Group of the Field Engineering Division, in 
cooperation with the IBM Poughkeepsie Main 
Plant of the Systems Manufacturing Division, 
initiated a study in which a number of manu- 
facturing plant personnel received required 
training in basic processing principles through 
either computer-assisted instruction or pro- 
grammed texts. This paper deals with the 
conduct and results of that study. 


MeETHOD 
Computer-Assisted Instruction System 


The computer-assisted instruction (CAI) con- 
figuration employed in the study consisted of an 
IBM 1440 computer system, coupled, through tele- 
processing facilities, to IBM 1050 student/author 
terminals as shown in Figures 1 and 2. 

The use of teleprocessing permitted the remote 
location of the student terminals with respect to the 
computer system. In the study, two terminals were 
located in the manufacturing plant, approximately 
10 miles away from the IBM 1440 system. 

Course material prepared by a course author in 
a programming language known as Coursewriter ? 


2 Coursewriter is an announced IBM program, 
designed to permit instructors unfamiliar with com- 
puters or with computer programming to write 
courses for computer-assisted instruction. 
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Fic. 1. IBM 1440 computer-assisted 
instruction system. 


was stored on the 1311 disk storage devices shown 
in Figure 1. The material was presented to the 
student through the terminal typeouts and through 
a book containing figures, diagrams, etc., to which 
the student could be referred by the typeout. The 
student responded to the system through the 
terminal keyboard. 

For a more complete description of the system 
and its operation, including examples of student 
and author sequences, see Schwartz and Long 
(1966b). 


Course Material 


The subject matter for the study consisted of 
“Fundamentals of Data Processing,” a basic intro- 
duction to data-processing systems dealing with the 
following topics: 


. Data-processing concepts 

. Primary storage and data coding 
. Magnetic tape 

. Stored programs and data flow 
. Logic operations 

. Console and systems checking 


Dank wn 


Two versions of the course material, a pro- 
grammed text (PI) and a CAI version, were em- 
ployed in the study. Both versions were authored by 
L. R. O’Neal, IBM, Poughkeepsie. The programmed 
text was primarily linear with constructed responses, 
but also included some multiple-choice responses and 
branch points. The text was authored in 1963 and 
has been used widely within IBM. The CAI version 
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of the course was designed to accomplish the 
same objectives as the programmed text, but 
included numerous system-controlled branches and 
skip options. 


Subjects 


The study utilized 104 newly hired electronic 
technicians. Their mean age was 24.8 years, with 
a range of 18-50 years. All were graduates of a 
civilian or an Armed Forces technical school. These 
subjects were divided into two equivalent groups in 
terms of age, amount of technical education, and 
technical work experience. 


Procedure 


The study was conducted between October 1964 
and May 1965. Upon reporting to work at the IBM 
Main Plant, Poughkeepsie, New York, the students, 
as one of their initial assignments, were required 
to complete the “Fundamentals of Data Processing” 
course. The students were divided into two groups. 
There were 79 students who took the course via 
programmed texts, the currently operational mode 
of presentation for this course. The same course 
was taken by 25 students via the computer- 
assisted instruction method. The limited number of 
CAI terminals available for the study precluded a 
more even division of students to the two groups. 

The programmed-text students studied in a class- 
room under the supervision of a monitor. The CAI 
students studied in a separate room housing the 
terminals and were also monitored. 

The students studied during the first 4 hours of 
the second shift (4:00 p.m. to 8:00 p.m). During 
the remaining 4 hours of the day they attended 
conventional lecture classes on unrelated subjects. 
Eight days (32 hours) were allowed for the 
completion of the material. 

Before beginning the course, each student was 
given a pretest to determine his initial level of 





Fic. 2. IBM 1050 data communications terminal. 
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TABLE 1 


EXAMINATION SCORES AND COMPLETION TIMES 








CAI PI 
Item 
M oy D) M SD 





Pretest ES 13.3 NGS | ENO 
Final examination 86.5 dtl 86.1 9.1 
Hours to completion} 22.4 4.6 25.1 6.1 





knowledge of data processing. Upon completion of 
the course, each student was administered a final 
examination to evaluate the level of proficiency at- 
tained. Prior to the administration of the final 
examination, each student was asked to complete 
an attitude questionnaire to determine his attitude 
toward his respective technique of study as com- 
pared to a regular classroom procedure. 


RESULTS AND DISCUSSION 


The achievement scores and course com- 
pletion times for the two groups are pre- 
sented in Table 1. 

The results of the pretest showed no sig- 
nificant difference between the two groups 
(p> .05). Therefore, it may be assumed 
that in terms of knowledge of course-related 
data-processing information the groups were 
equivalent at the outset of the course. 

Final examination scores of the two groups 
did not differ significantly (p > .05). Thus, 
the two instructional methods appear to be 
equally effective in accomplishing _ their 
objectives. This outcome is readily under- 
standable, inasmuch as both the CAI and the 
PI presentations were designed, pretested, and 
revised to meet certain specified objectives. 
Since these objectives were the same for both 
the courses and since the final examination 
justifiably tested for the accomplishment of 
these objectives only, no real difference in 
examination scores would be expected. In any 
event, the uncertainty involved in the inter- 
pretation of most achievement test scores 
would tend to vitiate the meaningfulness of 
any differences. unless quite large. 

Time, however, is another matter. In the 
industrial world, time is rather directly 
translatable into economic factors, and there- 
fore any saving in time is noteworthy. Table 
1 shows that there was approximately a 10% 
difference in the amount of time required to 
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complete the course, 22.4 hours for the CAI 
group as compared with 25.1 hours for the 
PI technique. This difference was statistically 
significant (¢= 2.35, df= 103, p< .05). 
Since the course content was essentially the 
same, it is likely that this time saving was 
due to the pretesting and system-controlled 
branching features built into the CAI presen- 
tation, which permitted the CAI students to 
propel themselves through the material more 
rapidly than the PI text students who, due 
to the nature of that medium, were forced to 
proceed in a more uniform sequence. 

Although not statistically significant, there 
was a tendency for the variability of course 
completion time (as indicated by the stand- 
ard deviations shown in Table 1) to be 
greater for the PI group (F = 1.76; df = 78, 
24; 05<p<.10, two-tailed). Further 
study will be required to determine whether 
this effect is genuine or merely reflects 
random variation within the two groups. 
However, based on the previously mentioned 
fact that the PI students were forced to allow 
a more uniform sequence, a difference in 
variability would not be unexpected. Forcing 
the student, as in the case of the PI text, to 
cover unneeded material should not only 
lengthen the course completion time (as was 
found) but in so doing should also increase 
the variability by providing increased op- 
portunity for the operation of such factors 
as reading speed and attention span. 

It is interesting to observe that in studies 
reporting differences between PI and conven- 
tional classroom performance, the differences 
tend to follow a pattern. In the conventional 
classroom, because of the normal scheduling 
involved, the time required to complete the 
course is fixed; however, the students’ per- 
formance scores tend to vary widely. By con- 
trast, PI tends to produce variable times to 
completion, but more consistency in examina- 
tion scores. The present results suggest that 
CAI might lie between these two techniques 
in that it maintains as much consistency in 
examination scores as PI, but may produce 
less variability in completion times. 

The pattern of performance observed in 
this study (i.e., no difference in achievement 
scores, but a time saving in favor of CAT) 
is essentially the same as that observed by 
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Bitzer, Lyman, and Easley (1966) in CAI 
versus classroom comparisons involving more 
academic material. 

Table 2 presents the results of the objec- 
tive portion of the attitude questionnaire. In 
this portion the student was presented with 
four questions concerning his feeling toward 
his particular method of instruction (CAI or 
PI). The student indicated his choice by 
checking the most appropriate of five state- 
ments following each of the questions. Ac- 
cording to the statement checked, a scale 
value was assigned. The values ranged from 
1 (indicating negative feelings toward his 
own method) through 5 (indicating positive 
feelings toward his own method). 

It may be seen in Table 2 that both groups 
of students considered their respective meth- 
ods of instruction as approximately equal to 
the classroom in both effectiveness and de- 
sirability. The Median test (Siegel, 1956) 
revealed no significant difference between the 
two groups for any of the four questionnaire 
items. 

It should be noted that the PI students in 
this investigation studied in a group, in a 
classroom, and under fairly close supervision. 
In the IBM Poughkeepsie Main Plant, these 
happen to be the standard conditions for the 
presentation of this specific course to this 
specific type of employee (i.e., newly hired 
technicians). It must be emphasized, how- 
ever, that these are not typically the condi- 
tions under which PI courses are adminis- 
tered. More representative circumstances 
would find the PI student studying individu- 
ally, in some private or near-private environ- 
ment, but usually not a classroom, and with 
little if any supervision between the initiation 
and the completion of the course. It therefore 
seems likely that the performance of the PI 
students in this study was probably enhanced 
to some unknown extent by the unusual con- 
trol and competition factors present. Conse- 
quently, the comparisons were probably 
somewhat biased against the CAI group. 

Thus, from the standpoints of educational 
effectiveness and student acceptance, the 
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TABLE 2 
MEAN SCALE VALUES ON ATTITUDE 
QUESTIONNAIRE ITEMS 
Questionnaire item CAIG Er 
In your opinion, how well were you 3.0 3.0 


taught the material covered? 

In your opinion, how difficult is it to| 3.1 3.0 
learn through CAI/PI study? 

Which method of teaching do you like| 2.8 3.0 
best? 

If you had your choice, which method | 3.1 3.0 
would you use in future courses? 





Note.—Scale values range from 1 to 5. In comparison with 
regular classroom, a value of 1 indicates negative feelings 
toward their own method of instruction. A value of 5 indicates 
positive feelings toward their own method of instruction. A 
value of 3 indicates indifference between their own method and 
the regular classroom. 


results of the study indicate the feasibility 
of computer-assisted instruction as a means 
of industrial training. Extensions of the pres- 
ent study, involving the use of CAI in IBM’s 
widely decentralized Field Engineering Di- 
vision, have recently been completed. (See 
Schwartz & Long, 1966a.) 
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UTILITIES AS BASE-RATE MULTIPLIERS IN THE DETERMINATION 
OF OPTIMUM CUTTING SCORES FOR THE DISCRIMINATION 
OF GROUPS OF UNEQUAL SIZE AND VARIANCE? 


LEONARD G. RORER, PAUL J. HOFFMAN, anp KUO-CHENG HSIEH 


Oregon Research Institute, Eugene 


The accuracy with which a test classifies people, objects, or events as belonging to 1 
of 2 groups depends upon the distance between the means, the relative variability, 
the relative size, and the shape of the distributions of the 2 groups. If the scores 
for each of the groups are normally distributed, tables for determining optimum 
cutting scores for a wide range of values of the other variables are now available 
(Rorer, Hoffman, & Hsieh, 1964). However, overall accuracy is an appropriate 
guide for decision making only when all correct classifications are equally beneficial 
and all incorrect classifications equally costly. A simple technique makes possible 
the utilization of the Rorer, Hoffman, and Hsieh tables when a different value is 


assigned to each of the outcomes. 


In a now classic paper Meehl and Rosen 
(1955) showed that cutting scores on tests 
must be set in relation to population base rates. 
They pointed out that failure to make such 
adjustments can result in a higher rate of 
erroneous Classifications than would have re- 
sulted had the test not been used at all, and 
they presented inequalities that must hold if 
a test is to improve on the classificatory 
accuracy that could be achieved by using the 
base rates alone. Dawes (1962) has provided 
a succinct restatement of these inequalities in 
conditional probability notation. 

Few researchers have heeded Meehl and 
Rosen’s warning. An exception is Brown 
(1964), who evaluated a proposed college- 
admission selection procedure in terms of its 
incremental contribution to the success rate 
that could be expected on the basis of student 
self-selection. Rorer, Hoffman, LaForge, and 
Hsieh (1966) have summarized the literature 
relating to base-rate predictions and presented 
a general solution for optimum classification 
when groups differ in both size and variance. 
In addition, Rorer, Hoffman, and Hsieh (1964) 
provided tables so that individual investigators 
could set cutting scores appropriate to the base 
rates in their situation. Given the means, 
standard deviations, and relative size of the 
two groups, the tables give the optimum 
cutting scores to discriminate one from the 
other, providing that the scores of each group 


1 This investigation was supported in whole by Public 
Health Service Research Grant MH 04439 from the 
National Institute of Mental Health. 


are normally distributed and all errors are 
considered equally costly. 

Inspection of the Rorer, Hoffman, and Hsieh 
tables shows that, even for good tests, the 
impact of extreme base rates can be disastrous. 
The effect is illustrated by the situation de- 
picted in Figure 1, where the large curve might 
represent the general population of normal 
(‘‘well’”’) individuals, and the small curve might 
represent those individuals who are suicidal. 
It should be clear from the figure that even 
though the mean score of the suicidal (S) 
group is 2ow greater than that of the normal 
(W) group, there is no way in which this test 
can be used to identify suicidal individuals 
without making more erroneous classifications 
than would result if everyone were classified 
as nonsuicidal. However, the individual in- 
vestigator may view suicide as sufficiently im- 
portant, and the preventative (e.g., 24 hours 
detention) as sufficiently unimportant, that 
he is willing to tolerate more false positives 
than valid positives as a reasonable cost for 
identifying (and preventing) some of the po- 
tential suicides. He can probably even set some 
rough boundaries on the ratios that he would 
be willing to tolerate. If so, he can use the 
Rorer, Hoffman, and Hsieh tables in such a 
way as to take these differential utilities into 
account. It is the purpose of this paper to 
present and illustrate a simple technique 
whereby this may be done. 


SPECIFICATION OF THE PROBLEM 


In the taxonomy proposed by Cronbach and 
Gleser (1965) this problem would be classified 
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Fic. 1. Score distributions for “valid” test (us — uw = 2ow) when Ny = 100 N,. 


as type aaaaaa: single-stage, institutional 
placement into one of two groups without a 
quota. In the presentation that follows (and in 
the Rorer, Hoffman, and Hsieh tables) the 
_ groups have been designated W and S. The S 
group arbitrarily constitutes what would ordi- 
narily be called the criterion group; that is, 
persons in the S group will be called positives, 
persons in the W group will be called negatives, 
and the base rate (B) will refer to the propor- 
tion of Ss (suicidals) in the population: 


ee 
N, + Nw 


While the designations S and W are quite 
general, a convenient mnemonic for the en- 
suing discussion is “‘sick”’ and ‘“‘well,”’ where 
sick may be thought of as referring to a group 
that should be hospitalized, or to a group that 
might profit from Treatment X or Drug Y. 
Alternatively, the S group might be skilled 
workers who should be given special training 
or transferred to a specialized job. The problem 
is to assign each individual to one of the two 
groups (S or W) on the basis of his score on a 
test. The ‘‘test score” may actually be some 
rather complicated function of scores, ratings, 
personal history items, etc. In fact, the scores 
on the basis of which the individuals are 
classified may evolve in any way whatsoever 
as long as the combinatorial and algebraic 
manipulations of the data lead ultimately to a 
reasonably continuous and normally distrib- 
uted variable (or “‘test’’) in each of the groups. 

Table 1 indicates the four categories into 


which individuals from the population may 
fall. With each cell in the table there is associ- 
ated some value (V), which in the literature 
has been variously referred to as a loss (L), 
a gain (G), a cost (C), or a (positive or nega- 
tive) utility (U). This value may be either 
objectively or subjectively determined. In an 
industrial setting, where a test is being used 
to select skilled workers, it may be possible to 
assign values to the cells objectively on the 
basis of cost-accounting procedures. In a psy- 
chiatric setting, where, for example, a test is 
being used to identify suicidal patients, the 
values in the table will have to be subjectively 
determined. A general discussion of the prob- 
lem of assigning values to outcomes can be 
found in Cronbach and Gleser (1965), and.a 
procedure relevant to the present case is con- 
tained in Pratt, Raiffa, and Schlaifer (1964). 
The problem of assigning values to the out- 
comes in this table will be discussed at greater 
length below. For the moment it is assumed 
that the values have been established in some 
way, and that they represent the value to be 


TABLE 1 


PossIBLE DECISION OUTCOMES 
AND ASSOCIATED VALUES 








Predicted 
Actual 
w S 
W View Vive 
S Vow Ves 
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TABLE 2 
Loss TABLE 
Predicted 
Actual 
W S 
W Soren re 
S rer IB 





assigned to each cell in relation to the other 
cells. That is, the values need not be assigned 
on an absolute scale, but only in relation to 
each other, as will soon become obvious. 


THE SOLUTION 


Given: a test on which two groups, W and S, have 
score distributions fy(x) and f,(x), with means py and 
us and standard deviations ow and o,; a base rate 


= sextehl Bony ; and the loss (L) table—Table 2, 
Veet NG 
with the constraints that Lww <Lws, and Les < Lew; 
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a classification of W is 
L,(W) = (i ao BY Lire w(t) aF BLawfa(%), 
and the expected loss associated with a classification of 
S is 
er (S) oe (1 = B) Lvs for (3) ee Birnie 
Obviously, the expected loss will be minimized by 


classifying each « as W if L,(W) <L,(S), and vice- 
versa. Thus, classify as W if 


C= B)) Diver bm) =F BL sw fa (x) 
< (1 = B) Deere ifn (x) + Bites 


which reduces to 


Bi ie = De is) < (i ar B) (i = Taw en Coe 


Conversely, classify as S if 
B(Law — Tea) fa (%) ite B) (Lvs = Lrww) for (8). 


The classification can change, that is, cutting scores 
can occur, only at points where the above expressions 
are equal. It is possible to solve for those xs at which 
the expressions are equal if some particular distribu- 
tions are specified for f,(*) and fy(«). In line with 
earlier developments, it is assumed that these dis- 
tributions are normal. Then cutting scores occur 





then, for any score (*), the expected loss associated with | where 
1 — (x —ps)? 1 ne (x —pw)? 
Bene aa aa) ——¢; 2o% = (1 a B) (Cis = oer == €. 2ow? ’ 
o3N er ow Nae 


from which it can be shown that the cutting scores are at 





(usow? = oat) =k GwOs a = bs)? IF 2 (ow? ya a3?) In ( 


x= 


Ow 


For convenience, let So = L. When Equation 
1 is compared with the solution for equal utilities (Rorer 
et al., 1966), it can be seen that the previous solution 
is the special case of Equation 1, for which ZL = 1. By 
inspection of Equation 1, it can be seen that the same 
cutting score will be appropriate for all cases in which 


pee ieeeis 
1— 3B 


where K is any constant. If utilities (losses) are not 
considered, that is, if Z = 1, 
se eee 


1—B 


wale Ba 


However, if utilities are considered, and L # 1, then 
it is possible to specify some new value, B’, such that 


/ 
2. ee 
1—B 1 — B’ 
A little algebra shows that 
ate LB 
1—-B+LB° 


— Ox 


B ie =. ss) ow 


; Iba era elre ioe . [1] 








By looking in the Rorer, Hoffman, and Hsieh tables 
under this new value, B’, it is possible to find those 
cutting scores which are appropriate for values (losses) 
which the investigator has specified. 


An Example 


The following example was presented in Rorer et al. 
(1966, p. 162) in order to demonstrate the use of the 
tables presented therein: 


First, the group with the larger mean is arbitrarily 
designated the S group. Second, us — uy is found to 
equal 1.00, indicating that Table 2 should be con- 
sulted. o,/aw is found to be .6, which indicates that 
the third column from the left is appropriate. The 
population is thought to be 95% W and 5% S, which 
indicates that the fourth block down from the top 
is appropriate. C; and C refer to tabular footnote b, 
which indicates that the test is of no use in this 
situation. 


Now suppose that there is some cost involved in test- 
ing each individual, that the treatment to be admin- 
istered is relatively inexpensive, and that failure to 
diagnose as sick is quite serious. Then the loss table 
might look like Table 3. One need merely calculate 
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TABLE 3 TABLE 4 
Loss TABLE Loss TABLE 
Predicted Predicted 
Actual Actual 
W S W S 
W 1 2 W 1 5 
S 20 2 S 700 5 
eee ik and Bl = 18 (.05) — 59 one individual may see differences on a small 
= 1 1 — .05 + 18(.05) ’ scale whereas another sees them on a large 


to find the adjusted base rate with which to enter the 
table. The seventh block in the third column of Table 2 
(Rorer et al., 1966) shows that individuals scoring 
between .36 and 2.77 should be classified as S. 

If S indicates a group of individuals who might com- 
mit suicide a clinician who felt strongly about pre- 
venting suicide might fill in the table as in Table 4. 
If testing involves one unit of cost, then in this clini- 
cian’s value system, the treatment entailed by a 
diagnosis of “suicidal” represents five units of loss, 
while a successful suicide represents 700 units. Here 
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' L= ‘a. = 1:73.70; and 


2 173.75 (.05) in 
> 4 = 05 + 173.75(.05) 


This clinician would enter the table with B’ = .90 and 
find C; = —.42 and Cz = 3.54, a result identical to the 
second example in the previous article. Thus, these 
examples show that a valid test, rendered useless by 
adverse base rates under the assumption of equal losses, 
may have considerable classificatory utility when 
differential losses are assigned to the possible outcomes. 


B’ 90. 


DISCUSSION 


The approach taken here is not entirely new ; 
others have presented generally similar solu- 
tions (e.g., Cotton, 1960; Darlington & 
Stauffer, 1966a, 1966b; Edwards, 1954, 1956). 
Cronbach and Gleser (1965) is highly relevant 
and provides an excellent guide to other 
articles. More general mathematical treat- 
ments may be found in a number of places 
(e.g., Blackwell & Girshick, 1954). 


The Loss Table 


It is not always easy to assign numbers to 
one’s values. It is, therefore, helpful to consider 
the exactness with which this must be done. 
First, it should be noted that the scale that an 
individual utilizes is irrelevant. All of the 
entries in the table may be multiplied or 
divided by a constant without changing the 
value of L. Thus, it makes no difference that 


scale. This is illustrated in Parts A and B of 
Table 5, which contains three loss tables, each 
with L = 4. 

The value of Z is also unchanged when a 
constant is added to, or subtracted from, all 
the values in a row of the table. Thus, it is 
possible for the relative size of the values in a 
column to change markedly without affecting 
the value of L. This can be seen by examining 
Parts A and C of Table 5. In Part A, false 
negatives are assigned a loss six times as great 
as true negatives, whereas in Part C false nega- 
tives receive a value only one third greater than 
that of true negatives. This means that, if the 
ratings in the table are all made on the same 
scale, it makes no difference how one views 
false positives in relation to true positives. Nor 
does it matter how one views false negatives 
in relation to true negatives. It is only the 
difference between the values assigned to 
false positives and true negatives in relation to 
the difference between the values assigned to 
true positives and false negatives that will 


TABLE 5 
3 Loss TasBLes ALL with L = 4 














Predicted 
Actual ——— 
W S 
A 
W 1 2 
S 6 2 
B 
WwW 15 30 
S 90 30 
¢ 
W 75 80 
S 100 80 
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TABLE 6 
OuTCOME TABLE WITH GAINS INSTEAD OF LOSSES 








Predicted 
Actual 
W S 
W Gar Gwe 
S Gaw Gas 





affect the decision-making strategy. The result 
is not intuitively obvious, and most individuals 
will have to experiment a bit with the tables 
in order to get a feeling for the differences in 
values that make a difference in decisions. 

The discussion so far has consistently re- 
ferred to a loss table; however, it should be 
noted that the table could just as well be 
described as a utility table or a gain table. If 
the table entries are specified as in Table 6, with 
the constraints that Gww>Gws and Gss>Gew, 
and the same derivation carried through 
as for losses, the result will show that the 
Gss — Gare 
Grew ae on 
which is numerically equal to the expression 
for ZL 


cutting score equation contains G = 


Quantifying Values 


Throughout this paper the values in the gain 
or loss tables have been treated as givens. This 
is meant to imply neither that they are 
necessarily easily established nor that they are 
impossibly difficult to estimate. The problem, 
which falls outside the scope of this paper, has 
been considered by others (e.g., Edwards, 
1954; Luce & Raiffa, 1958), and the reader is 
referred to these sources. 

While the various techniques for quantifying 
values cannot be considered, a word, neverthe- 
less, needs to be said concerning the necessity 
for doing so. However difficult the process of 
quantifying one’s values may be, there is no 
avoiding it if one is to defend his procedures 
as “rational” or “scientific.” It sometimes 
seems that decision-making procedures are 
rejected if they do, or if they do not, incor- 
porate differential values in association with 
the possible outcomes. If they do not incor- 
porate such values, they are rejected on the 
grounds that they cannot accurately represent 
real situations. If they do incorporate such 


L, G, Rorrr, P. J. Horrman, anp K, C. Hsreu 


values they are rejected on the grounds that 
such values cannot be quantified, or that the 
quantification has made the problem artificial ! 
It must be recognized that if an individual is 
unable to construct a table representing his 
values with regard to specific decision out- 
comes, then he has no rational basis on which 
to make that decision, with or without the 
test. While irrational decision making may be 
widespread and often unavoidable, it can 
hardly be defended as a scientific enterprise. 
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OCCUPATIONS TEN YEARS LATER OF HIGH SCHOOL 
SENIORS WITH HIGH SCORES ON THE SVIB LIFE 
INSURANCE SALESMAN SCALE* 


DAVID P. CAMPBELL 2 


Center for Interest Measurement Research, University of Minnesota 


From a pool of 2,500 Minnesota high school seniors of the classes of 
1953 and 1954, 93 students were identified who had “A” ratings on the SVIB 
Life Insurance Salesman scale. Information on their current occupations was 
collected from 72 of them. Of these, 10% were in the life insurance business, 
32% were in other sales jobs, 12% were in business-contact jobs such as 
public relations, 22% were in social service persuasive jobs such as lawyer 
or minister, and 24% were in essentially unrelated jobs. In a further analysis, 
each profile was analyzed as to its appropriateness for the individual’s current 
occupation. 64% were classified as “hits,” 22% as “misses,” and 14% as 


‘““ndeterminate.” 


There are at least three ways to determine 
the predictive validity of an interest inven- 
tory. The first is to work with a sample of 
men, all of whom are in the same occupa- 
tion, and determine if inventories completed 
earlier by them would have predicted that 
occupation. With this technique, the results 
might vary from occupation to occupation as 
some occupations might be more predictable 
than others. The second method is to work 
with a sample, all of whom score high on a 
single occupational scale and follow them up 
to determine if they actually entered that 
occupation. These results might vary as some 
scales might be more predictive than others. 
The third method, a more general extension 
of the two described above, is to use a sample 
of individuals from a variety of occupations 
and with diverse interest patterns to deter- 
mine if, in general, scores predict occupations. 
In this last approach, differences between 
occupations and scales should be averaged 
out. 

The first method, working with men in a 
specified occupation, has been used by Berdie 
(1960, 1965). He identified graduates of the 
University of Minnesota in curricula which 
are very predictive of future occupations, 
specifically Law, Medicine, Dentistry, Me- 


1The research reported here was supported by 
National Institute of Health Grant HD-01428-01. 

2The author would like to acknowledge the as- 
sistance of William Anderson in the data-gathering 
and analyses portions of this study. 


chanical Engineering, Accounting, and Jour- 
nalism, then studied their Strong Vocational 
Interest Blank (SVIB) scores from inven- 
tories completed while they were high school 
seniors. His results indicate that the SVIB 
scores, on the average, were quite predictive 
of the general direction of the student’s 
career and fairly predictive of the actual 
occupation, though there were differences in 
the level of predictive validity among the 
occupations with the mechanical engineers 
being the most predictable and the journalists 
least predictable. In a later follow-up, how- 
ever, it was clear that the concurrent validity 
of scores from inventories completed by these 
individuals as adults was higher than the 
earlier predictive validity (Schletzer, 1963). 

The second method mentioned above, that 
of studying the eventual occupations of a 
sample who all score high on a single scale 
as students, is the one used in this paper 
and is discussed in more detail below. Briefly, 
what was done was to identify a group of 
high school seniors with A ratings on the 
SVIB Life Insurance Salesman (LIS) scale, 
then locate them later to see what occupation 
they had entered. 

The third method, using a sample with 
both varied interests and from a wide range 
of occupations, was the technique used by 
Strong in his well-known follow-up of Stan- 
ford University graduates 18 years later 
(Strong, 1955). Because of the diversity of 
occupations and interests among the sample, 
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Fic. 1. Average SVIB profile of high school seniors with “A” on SVIB 
Life Insurance Salesman scale. 


this method presents more difficult analytic 
problems than the earlier two. Strong ana- 
lyzed the results in two ways: first, by using 
experts to rate whether the inventories com- 
pleted during college days were predictive of 
eventual occupations, and second, by identi- 
fying the “appropriate” scale for each sub- 
ject (S) (eg., the appropriate scale for a 
doctor would be the Physician scale), then 


averaging the scores on the appropriate scales 
over all Ss to determine if they were higher 
than the nonappropriate scores. 

Strong summarized his results by con- 
cluding that college students have a 3:1 
chance of entering and remaining in an 
occupation where they have an A rating on 
the appropriate scale (i.e., standard score 
above 45), and 5:1 odds against being in 


OccuPATIONS TEN YEARS LATER AND SVIB Scores 


an occupation where they have a C rating 
(i.e., standard score of 25 or below). 

The results from both the Berdie (1960, 
1965) and Strong (1955) studies indicate 
substantial, though not perfect, predictive 
validity for the SVIB. The results from the 
current study support these findings and add 
one further strand to the web of knowledge 
concerning the predictive validity of this 
venerable inventory. 


METHOD 


Through the Minnesota Statewide Testing Pro- 
gram, a comprehensive testing program is available 
to the high schools of Minnesota. The SVIB is 
included in this program, and the majority of high 
schools in the state administer it to their students. 
Some high schools give it to the entire senior class 
but most use it only for some selected subsample, 
usually those students most oriented toward at- 
tending college after leaving high school. In all 
instances, the selection of which students shall take 
the SVIB is made at the high school level, usually 
by the counselor or principal. 

When the inventory has been completed, it is 
sent to the University of Minnesota for scoring. 
The scores are returned to the high school and also 
filed at the University with the other records of 
the Statewide Testing Program. For this project, 
the records from the 1953 and 1954 classes from 
several Minneapolis-St. Paul high schools were 
skimmed to find a sample of students with high 
scores on the LIS scale of the SVIB. Twin City 
high schools were used to make it easier to find 
the students today, and classes graduating about 10 
years ago were used, both because this length of 
time allows most people to settle down into their 
occupations and, from a practical standpoint, high 
school classes usually have 10-year reunions, a tradi- 
tion that can be quite helpful in locating Ss. 

After skimming approximately 2,500 profiles, 93 
were found with T scores over 45, that is, A ratings, 
on the LIS scale. While it was originally intended 
to use profiles that had high LIS scores and low 
scores on all the other scales, it was impossible to 
find many that would meet that criterion. Thus, 
most of the profiles had other high scores as well. 

Of these 93 individuals, 21 could not be located 
currently or never returned the short questionnaire 
asking about their current occupation. 

The average profile for the final sample of 72 
is presented in Figure 1. Note that selecting profiles 
with high LIS scale scores actually resulted in higher 
scores on the Real Estate Salesman scale. 


RESULTS 


Two types of analysis were made. In the 
first, using the information from the ques- 
tionnaire, Ss were separated into the follow- 
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TABLE 1 


NUMBER OF STUDENTS IN VARIOUS 
OccUPATIONAL CATEGORIES 











Category N Percentage LIS M score 
Life insurance sales 7 10 514: 
Other sales 23 32 53.6 
Business contact 9 12 49.0 
Social service 16 22 Seen 
persuasive 
Nonrelated jobs 7 24 50.8 
Total 72 100 





ing categories: (a) life insurance sales; (0) 
other sales jobs, for example, real estate, 
machinery, office equipment; (c) other busi- 
ness-contact jobs, for example, public rela- 
tions, loan office manager, radio announcer; 
(d) persuasive social service jobs, for ex- 
ample, lawyer, minister, teacher; (€) non- 
related jobs, for example, political scientist, 
doctor, reporter, pilot. 

The categories were developed intuitively 
in order of their similarity with the life 
insurance salesman occupation. Table 1 pre- 
sents the number and percentage of the 
sample falling in each category, and also 
the mean LIS scale score for each category. 
As can be seen, 10% of the group were 
actually employed in the life insurance busi- 
ness, while another 32% were employed in 
other sales jobs. Thus, almost half of the 
group were salesmen. Many of the other 
half were in related occupations though about 
one fourth were in essentially unrelated 
occupations. 

It might be noted that, with the possible 
exception of the doctor and pilot, not a single 
individual was in any occupation even re- 
motely connected with the hard sciences. 
There were no scientists, no laboratory tech- 
nicians, no mechanics, no engineers. Even 
the occupations which have been classified 
here as unrelated to life insurance salesmen 
were almost exclusively occupations which 
deal with people rather than with things. 

The analysis in Table 1 is a harsh one for 
it implies that if the individual is not cur- 
rently a salesman, the inventory has failed 
to predict his adult occupation. Because the 


372 


LIS scale was frequently not the highest 
score on the individual’s profile, that is not 
an accurate conclusion. For example, the 
Navy Pilot, classified in the unrelated occu- 
pations category in Table 1, had a high 
score on the Aviator scale as well as on the 
LIS scale, and that profile certainly could 
be considered a hit. 

The profiles and questionnaire data were 
analyzed in a second manner by classifying 
each profile as either: “hit,” “miss,” or 
“indeterminate.” A profile was classified as a 
hit if the individual had an A (score above 
45) rating on the scale appropriate for his 
occupation. Profiles where no appropriate 
scale was available or where only incomplete 
information was available on the individual’s 
occupation were classified as indeterminate. 
The remaining profiles were classified as 
misses. 

Using these methods, the following results 
were obtained: 


N % 
Hits 46 64 
Indeterminate 10 14 
Misses 16 22 

Total 72 100 
DISCUSSION 
The results presented above indicate 


clearly that there is a substantial relation- 
ship between high scores on the SVIB and 
eventual adult occupations, though generali- 
zations from this study should be restricted 
to occupations in the sales area. 

Whether these results should be considered 
as demonstrating high, moderate, or low pre- 
dictive validity is not clear. Of a group 
scoring high on the LIS scale, only 10% 
were actually in that occupation; that does 
not seem to be very good prediction. How- 
ever, that conclusion ignores the fact that a 
1-out-of-10 hit rate is substantially above 
chance (it is difficult to determine just how 
many life insurance salesmen would be found 
in a random sample of high school seniors 
10 years later but it is certainly less than 
10%) and that some of these individuals 
entered other occupations that were perfectly 
congruent with their measured interests. 


Davin P. CAMPBELL 


If the percentages of hits and misses are 
considered, ignoring the indeterminates, one 
finds a 3:1 ratio (64%:22%), precisely the 
same as found by Strong in his 18-year 
follow-up of college students where he con- 
cluded his chances were 3:1 that an indi- 
vidual would be in an occupation where he 
had an A rating. 

Thus, it seems safe to conclude that, con- 
servatively, these results demonstrate at least 
moderate predictive validity for high school 
students on the Sales keys of the SVIB. 

One final note: In most follow-up studies 
using the Strong, there are individual cases 
that reinforce the investigator’s faith in the 
validity of the SVIB. These individual cases 
can hardly be called scientific evidence, yet 
they occur so often that most people who 
work closely with the Strong Blank probably 
base their opinions of the Strong as much on 
these cases as on the published data. One 
such case which occurred in this study was 
the boy who had two unusually high scores 
on his high school senior SVIB profile, 64 
on the Mortician scale and 64 on the Sales 
Manager scale. Today he is managing the 
Cemetery Monuments Department of a large 
mail-order chain store. It is that type of 
outcome that tends to weaken one’s objectiv- 
ity in evaluating the predictive validity evi- 
dence for the SVIB, and leads one to believe 
that if there were more adequate ways of 
classifying occupations, even higher predictive 
validities for interest inventories could be 
demonstrated. 
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A NOTE ON THE EFFECT OF PRIVACY IN TAKING 
TYPING TESTS 


WAYNE K. KIRCHNER 


Minnesota Mining and Manufacturing Company, St. Paul 


80 female job applicants completed a standard typing test as part of a regular 
job-selection procedure. Of these, 40 were tested individually, 40 in groups of 
2 or more. When compared on test results, females tested alone typed almost 
4 words per min. faster (p < .01) on the average. The same group had slightly 
fewer errors but the difference was not significant. Results suggested that 
privacy could have a direct effect on test performance. 


Most standard test-taking directions indi- 
cate that a person to be tested should be put 
into a quiet room alone, relatively free from 
distraction so that test performance can be 
as good as possible. This is so even though 
good texts on testing (Anastasi, 1961; Cron- 
bach, 1960) do not cite any direct evidence 
for this. Actually, Hovey (1928), in an 
experiment with college sophomores on the 
Army Alpha test, found that marked distrac- 
tors such as bells, buzzers, music, spotlights, 
and other interruptions did not appreciably 
reduce test scores of the experimental group. 
Similarly, Smith (1951) found no great ef- 
fects from distractors such as a 100 decibel 
noise on performance on number and name 
checking and on a paper-form board test. 
Fendrick (1937) did find, however, a de- 
crease in reading efficiency when students 
were distracted by phonograph records while 
reading. None of these experiments occurred 
in an industrial setting, however. 

In any case, the test-taking direction sug- 
gesting a quiet room alone from others is 
hard to follow in an industrial setting because 
of the difficulty in testing individuals alone. 
This is so because space and time limitations 
in industry usually make group testing more 
feasible. This, of course, may be detrimental 
to the average job applicant. 

As a result, it seemed worthwhile to con- 
duct an investigation of the effect of privacy 
or lack of privacy upon job applicants and 
so the following brief study was conducted 
utilizing 80 female office-job applicants. 


METHOD 


A typing test was selected for the basis of com- 
parison because typewriters obviously make noise 


373 


which can distract individuals if they are in the 
same room with other persons. On the other hand, it 
is possible from hearing the rate of typing to ascer- 
tain whether or not another person in the same 
room is typing faster or slower and this could, 
because of competitive zeal, actually enhance per- 
formance. To determine what actually does happen, 
a simple study was conducted. 

Forty female job applicants were tested individu- 
ally on a standard typing test. Forty other job 
applicants were tested in groups of two or more on 
the same test. Assignment to the individual or group 
testing was made at random and no differences in 
age or length of experience were noted between the 
two separate groups. Scoring was based on the num- 
ber of words typed per minute and the number of 
errors made. Results of the study are indicated 
below. 


RESULTS AND DISCUSSION 


In Table 1 are shown the mean scores 
obtained by the 40 applicants tested alone 
and the 40 applicants tested in a group situa- 


TABLE 1 


CoMPARISON OF 80 FEMALE OfficE-Jos APPLICANTS 
TESTED ALONE OR IN GROUPS ON A 
Typinc TEST 











Typing 
speed No. of 
Group N (words SD typing SD 
per errors 
minute) M 
M 
Applicants 40 | 53.10 | 8.07] 3.70 | 2.96 
(tested alone) 
Applicants 40 |} 49.55 |10.27] 4.13 | 3.56 
(tested with 
others) 
M difference 3.59 A3 
t ode 98 
*p <.01. 
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tion. It is apparent that in terms of speed 
(number of words typed per minute), there 
is an advantage for applicants who were 
tested alone. They average almost four words 
per minute faster, a practical difference, and 
this difference was also significant statisti- 
cally. The individually tested group also made 
slightly fewer errors, on the average, but the 
mean difference was not statistically or prac- 
tically significant. Of some interest, too, was 
the fact that there was a greater variation or 
spread in performance of the applicants who 
were tested in a group situation. 

This, of course, is a limited study but the 
results do suggest that there was a possible 
detriment of performance for job applicants 
who were tested in a group situation. The 
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results tend to verify to some extent the 
standard cautions found in test manuals that 
it is better to test subjects individually 
whether the setting is industrial or otherwise. 
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EFFECT OF SWITCH CONFIGURATION ON THE 
OPERATION OF A SWITCH MATRIX * 


R. S. LINCOLN anp S. A. KONZ 2 
Lockheed Missiles and Space Company, Sunnyvale, California 


In a series of 3 experiments the speed and accuracy of switch-matrix opera- 
tions were determined for 5 different matrix configurations. Factors influencing 
performance included switch orientation (whether row or column), reach 
distance, and the type of symbol with which the switches were labeled. Re- 
sponse time was the only important performance measure. Error rates were 


negligible for all configurations. 


In this paper the term switch matrix refers 
to a group of pushbutton switches and related 
indicator lights arranged on a control panel 
in a systematic pattern of rows and columns. 
Such a matrix arrangement is frequently used 
for connecting one of several input lines to 
one of several output lines on communication 
panels and other similar control devices. The 
main purpose of the experiments included in 
this study was to determine how the speed 
and accuracy of switch-selection operations 
might be affected by the spatial configuration 
of the switches. A secondary purpose of the 
experiments was to determine the frequency 
with which occasionally false indications of 
matrix status would be detected. 

A related series of experiments was con- 
ducted at the Naval Research Laboratory 
(Garvey & Knowles, 1954; Knowles, Garvey, 
& Newlin, 1953). In those studies, however, 
only two of the five desired switch con- 
figurations were examined, and switch selec- 
tions were accomplished simultaneously with 
two hands rather than in the one-handed 
sequential manner required in the present 
experiments. 


EXPERIMENT I 
Equipment 
The test panel, shown in Figure 1, con- 
sisted of an 8 X 8 matrix of square indicator 
lights with single rows and columns of circu- 
lar pushbutton switches adjacent to each edge 


1This study was conducted for the Air Force 
Space Systems Division under contract AF 04(695)- 
207. The requirements for the study were developed 
by P. B. Zydner who also participated in the design 
of the experiments. 

2Now at the Department of Industrial Engineer- 
ing, Kansas State University, Manhattan, Kansas. 


of the matrix. The panel was mounted in a 
desk-type console with the panel surface 
slanting at an angle of 30 degrees from 
the vertical. The center of the panel was 
approximately 39 inches above the floor. 

In the first experiment four alternative 
switch configurations were evaluated. These 
configurations are described in Table 1. 

The letter and number indicating the 
switch selections to be made during an ex- 
perimental trial were displayed on an oscil- 
loscope located approximately 13 inches to 
the right of the switch panel. Letters and 
numbers were generated from a 5X7 dot 
matrix in a format similar to that used on an 
IBM 026 Printing Card Punch. Generation 
of the oscilloscope display, recording and 
timing of subject’s (S’s) responses, and opera- 
tion of matrix indicator lights were all under 
control of the PEPSS* computer. Permanent 
records of the Ss’ performance were accumu- 
lated on magnetic tape for later analysis, 


Procedure 


Sixteen Air Force NCOs each completed 64 experi- 
mental- trials on each of the four switch configura- 
tions. Every set of 64 trials included a single trial on 
each possible combination of the eight rows and 
eight columns of the matrix. Four different random 
sets of trials were employed. These sets were com- 
bined with the four switch configurations in two 
different, 4 X 4 Greco-Latin squares that determined 
the succession of experimental conditions which were 
presented to the Ss. 


3 PEPSS is a general purpose simulation system 
consisting of a small digital computer, special en- 
coding and decoding equipment, a real time clock, 
magnetic tape recorder, and on-line typewriter. The 
versatility of the system permits its application in a 
wide variety of human factors studies. Computer 
programs for the experiments were written by A. K. 
Smith. 
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Fic. 1. Switch-matrix panel. 


Eight Ss made their switch selections for each 
configuration in a left-to-right sequence, while the 
remaining eight Ss made their selections in a right- 
to-left sequence. With Configuration I (left column, 
upper row), for example, the left-right Ss made 
their first selection from the left column, and their 
second selection from the upper row. The right-left 
Ss followed the reverse procedure, making their first 
selection from the upper row, and their second 
selection from the left column. 

Prior to the start of a trial, Ss were required to 


position their right hand on a key switch located 
below the center of the matrix. Consequently all 
trials began with the S’s hand in the same position. 
The start of a trial was signaled by a buzzer that 
was followed 1 second later by the appearance of a 
number and a letter on the oscilloscope. The symbol 
designating the switch to be selected first always 
appeared on the left while the symbol designating 
the second switch to be selected always appeared 
on the right. A black paper mask outlined the switch 
configuration in use during each series of trials in 


Errects or SwitcH CONFIGURATION 


order to eliminate confusion that might result when 
transferring from one configuration to another. 
Figure 1 pictures the mask as it was positioned for 
Configuration III. The possibility of confusion was 
further reduced by a short practice period which 
introduced Ss to each configuration. 

As each switch selection was made the correspond- 
ing pushbutton cap was lighted. When the second 
selection was completed a matrix indicator was also 
lighted. Most of the time the lighted indicator ap- 
peared, as expected, at the intersection of the selected 
row and column. However, on 6 out of every set 
of 64 trials, the expected indicator did not light. 
Instead, on those trials, a randomly chosen indicator, 
displaced no farther from the expected position than 
one row and one column, was lighted. If the S$ 
detected the “false signal” he pushed an error button 
below and to the right of the matrix. If an S in- 
advertently selected an incorrect switch he also had 
to push the error button, since Ss were not allowed 
to correct their own errors. If no error of any sort 
was detected, Ss pushed a no error button below 
and to the left of the matrix. Four seconds after 
the matrix indicator was lighted, all lights were 
extinguished in preparation for the next trial which 
was initiated 3 seconds later. The task was self-paced 
in the sense that the Ss were permitted unlimited 
time to perform their selection operations. From 
that point on, however, sequence timing was fixed, 
and could not be altered by the Ss. The Ss were told 
to “work quickly but accurately.” 

The principal measures of performance were the 
times elapsing between the appearance of the display 
on the oscilloscope and the completion of the first 
and second selections. Additional measures included 
the number of errors made in the selection process, 
and the number of false indicator signals that went 
undetected. 


EXPERIMENT TI 


The specific purpose of Experiment II was 
to compare a two-response configuration from 
Experiment I with a configuration in which 
only one response was necessary to make the 
desired connection. 


Equipment 


The equipment used in the second experi- 
ment was modified so that rows were num- 
bered and columns were lettered. In addition, 
the square indicators in the matrix were con- 
verted for use as switch lights so that a 
selection could be made by pushing a single 
button representing both a row and column 
of the matrix. With this arrangement (Con- 
figuration V), the switch that was pushed 
always lighted, except when “false signals” 
were programmed as in Experiment I. Se- 
lected buttons and indicators remained lighted 
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TABLE 1 
SwitcH CONFIGURATIONS FOR EXPERIMENT I 
Switches Spatial 
Configuration used arrangement 
i Left column ad 
upper row | 
II Left column [Pe 
lower row 
TIT Lower row os 
right column 
IV Upper row ay 


right column 





for only 0.5 seconds rather than the 4.0 
seconds of the first experiment. 


Procedure 


Twelve Lockheed technicians each completed 128 
experimental trials on Configuration III (lower row, 
right column—hereafter referred to as the two-switch 
configuration), and the same number of trials on 
Configuration V (one-switch configuration). Two 
trial sets, selected from the four sets used in Experi- 
ment I, were combined with the two Configurations 
to form a 4 X 4 Latin square. The same square was 
repeated three times. 

A black mask was again used to outline the 
switches required for the two-switch configura- 
tion. A second mask covered these same switches 
when selections were made with the one-switch 
configuration, 


EXPERIMENT III 4 


The main purpose of Experiment III was 
to determine whether switch selection time 
would differ for rows and columns of switches 
when mean reach distance was approximately 
equal for the two orientations. The geometry 
of the switch matrices used in Experiments 
I and II, of course, prohibited any such 
equality, 

A second purpose was to compare the rela- 
tive speed with which selections would be 
made from switches labeled with numbers and 
switches labeled with letters. 


Equipment 


Black paper masks outlined either Row E 
or Column 5 in Figure 1, as required. Mean 
4Data for Experiments II and III were collected 


by C. J. Baker, who also assisted in the analysis 
of results, 
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reach distance from the key below the matrix 
to the individual switches was 10.2 inches for 
the row and 10.4 inches for the column. Caps 
labeled 1 through 8 or A through H (running 
in sequence from left to right or from top 
to bottom) were placed on the switches as 
appropriate. These same symbols appeared 
on the oscilloscope at the beginning of each 
trial. Response times were measured from the 
appearance of the displayed symbol: (@) to 
the release of the key located below the 
matrix and (0) to the operation of the switch 
selected from the row or column of switches. 


Procedure 


Four Ss performed 64 trials with each of the four 
combinations of switch orientation and switch labels, 
in an order determined by a 4X4 Greco-Latin 
square. The same square was repeated on three 
successive days, providing 192 trials for each S 
under each experimental condition. In this square 
the Greek letters corresponded to four sets of sym- 
bols in which each symbol appeared eight times. 
Different sequences of letters and numbers were used 
within the four sets of symbols, a procedure which 
confounded sequences and types of symbols. How- 
ever, no significant effect was expected as a result 
of the difference in symbol sequences, and the con- 
founding is assumed to be of no importance in the 
analysis of the resulting data. 


RESULTS 
Experiment I 


Experiment I compared the speed and ac- 
curacy with which switch selections were 
made from four different switch-matrix con- 
figurations, each of which required the suc- 


TABLE 2 


ANALYSIS OF TIME SCORES FOR EXPERIMENT I 











Source df MS F 
Sequence (right-left versus Le 92875 0 enol 
left-right) 
Squares within sequences 2 855.74 1.29 
Subjects within sequences 12 663.03 
(error) , 
Configuration 3 105.67 1.39 
Sequence X Configuration 3 522.44 6,87* 
Residual (error) 42 76.10 
Total 63 
*p <.01, 
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Fic. 2. Time required for both selection responses 
in Experiment I. 


cessive operation of two switches. Perform- 
ance measures included response times and 
error frequencies. 

Response time. An analysis of variance of 
the total time required for both selection 
responses is shown in Table 2. Although col- 
lected within a Latin-square format, the data 
were not analyzed as Latin squares in order 
to permit evaluation of the Sequence (right- 
left versus left-right) x Configuration inter- 
action. This procedure presumably somewhat 
inflated the second error term since the ef- 
fects of practice and trial orders were not 
isolated. The first error term (Ss _ within 
sequences) was used to evaluate the sequence 
effect since the sequence scores were produced 
by two unrelated S$ groups. The residual 
error term was used to evaluate all other 
effects because the remaining scores resulted 
from repeated measures on the same Ss. 

As Table 2 shows, neither of the main 
experimental effects (Sequence and Configu- 
ration) was significant. Consequently no gen- 
eral statement can be made about the superi- 
ority of either of the two Sequences or of 
any of the four Configurations. However, the 
significant interaction between Sequence and 
Configuration does permit some restricted 
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statements about experimental effects. This 
interaction is pictured in Figure 2. Figure 2 
has been drawn as if movement sequence were 
a continuum in order to more clearly depict 
the nature of the interaction. As indicated by 
Figure 2, the difference between the right-left 
and left-right Sequences depends on the Con- 
figuration involved. For three of the Configu- 
rations (I, III, IV) response time was shorter 
for the left-right Sequence. In contrast, re- 
sponse time was shorter for Configuration ITI 
when the right-left Sequence was used. A 
partial explanation of this interaction can be 
obtained by examining the individual re- 
sponse times for each of the two selection 
responses. As Figure 3 shows for the first 
selection response, and Figure 4 shows for 
the second selection response, switch orienta- 
tion (whether row or column) was an impor- 
tant determinant of response speed. Selections 
made from rows of switches were, in six out 
of eight cases, completed more rapidly than 
selections made from columns of switches. 
Furthermore, the pattern of performance 
scores exhibited in Figure 2 appears to be 
largely determined by the time required for 
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IT 








Fic. 3. Time required for first selection response 
in Experiment I. (The row designations indicate 
the configurations in which the first switch selection 
was made from a row of switches.) 
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Fic. 4. Time required for second selection response 
in Experiment I. (The row designations indicate the 
configurations in which the second switch selection 
was made from a row of switches.) 


Lele 








the first selection response. It is likely, then, 
that the interaction in Figure 2 reflects a 
response consistency related to the spatial 
orientation of the group of switches from 
which the first selections were made. 

Experimental changes in switch configura- 
tion automatically introduce variation in sev- 
eral response dimensions. The effects of one 
of these dimensions, rows versus columns, 
have been mentioned. At least one other 
dimension, reach distance, appears to have 
influenced obtained results. Thus, in Figure 3, 
initial selections made from a lower row were 
completed more rapidly than selections made 
from an upper row (p < .005), according to 
a Wilcoxon matched-pairs signed-ranks test 
(Siegel, 1956). The mean reach distance to 
buttons in the lower row was approximately 
10 inches less than the mean reach distance 
to buttons in the upper row. 

Still another potential contributing factor 
may be identified. In this experiment rows 
of switches were always labeled with numbers 
while columns of switches were always labeled 
with letters, an arrangement which may have 
influenced the relative time required to make 
switch selections. 

Tests involving individual degree of free- 
dom comparisons (Snedecor, 1946) were 
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TABLE 3 


ANALYSIS OF TIME SCORES FOR EXPERIMENT IT 














Source df “MS F 
Order of presentation 32-905 4,53* 
Subjects within orders (error) 8 641 
Practice 3 258 EO) ie 
Configuration 1 1.445 41.29** 
Trial set 1 .068 1.94 
Configuration X Set 1 035 1.00 
Residual (error) 30 .035 
Total 47 
*p <.05 
** b <.01 


made to determine the significance of some 
of the differences between the mean times for 
Configurations and Sequences shown in Fig- 
ure 2. These tests indicated a significant 
Sequence effect when Configuration II was 
eliminated from the analysis. In addition, for 
the right-left Sequence, Configuration II dif- 
fered significantly from the combination of 
the other three Configurations. However, for 
the left-right Sequence, the three fastest 
Configurations did not differ significantly 
among themselves. 

Response errors. Only 15 errors were made 
in performing the selection responses with the 
four Configurations. Based on the total of 
8,192 responses this represents an error rate 
of approximately 0.2%. Because of the small 
number, it is not possible to relate response 
errors to the different Configurations. How- 
ever, it is interesting to note that 11 of the 
15 selection errors were made on the second 
selection response. Furthermore, 11 of the 
15 selection errors were made in selecting 
switches from a column, 

On 384 trials a matrix indicator which did 
not correspond to the Ss’ selection was 
lighted. The Ss were required to indicate the 
detection of these false signals by pushing 
an error button. In only three trials did the 
false signals go undetected, producing an 
error rate of approximately 0.8%. 


Experiment II 


Although it did not differ significantly from 
Configurations I and IV (left-right sequence), 
Configuration III was selected for comparison 
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with Configuration V, the arrangement which 
required only one switch operation to com- 
plete a desired connection. Response times 
and error frequencies were again used as a 
basis for comparison. 

Response time. The most interesting com- 
parison of Configurations III and V concerns 
the time required to complete the selection 
operation. Although only one selection re- 
sponse was necessary with Configuration V, 
the perceptual aspect of the response was still 
two-dimensional; both the proper row and 
the proper column had to be located before 
the response could be completed. Conse- 
quently the time differential between the two 
methods might not be as large as their ap- 
parent differences would suggest. Further- 
more, the results of Experiment I indicated 
that the first response, which included time 
to read the oscilloscope, took about two thirds 
of the total selection time. This fact would 
also tend to reduce the expected difference 
between the one-switch and the two-switch 
configurations. The mean response time actu- 
ally obtained for the two-switch configuration 
was 2.31 seconds, while the mean time for 
the one-switch configuration was 1.96 seconds. 
This difference was significant as shown in 
Table 3. 

Since these data were analyzed as Latin 
squares it was possible to isolate the effect of 
the orders in which the different experimental 
conditions were presented to Ss. As Table 3 
shows, the order effect was significant when 
tested against an error term derived from un- 
correlated scores. Since all Ss worked with 
both Configurations, the Configuration effect 
was tested with an error term based on cor- 
related scores. The mean squares shown in 
Table 2 are much larger than those appearing 
in Table 3 because the former were computed 
on totals for trial sets while the latter were 
computed on means of trial sets. 

The final response made with both Con- 
figurations (activation of either the error or 
the no error buttons) signified whether or 
not the appropriate indicator lighted after a 
selection had been made. With the one-switch 
configuration the S’s finger was located di- 
rectly over the switch indicator when the 
indicator was lighted. Consequently it might 
be expected that the final response would be 
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made more quickly with that configuration. 
This, in fact, was the case. Final response 
times averaged 0.85 seconds with the one- 
switch configuration and 0.97 seconds with 
the two-switch configuration. The difference 
was significant (p< .05) according to the 
Wilcoxon matched-pairs signed-ranks test. 

Response errors. A total of 16 errors was 
made in selecting response switches from the 
two configurations. Of these 16 errors, how- 
ever, 11 were repetitive errors made with the 
two-switch configuration by the same two 
Ss who confused the letters A and H 
and the numbers 5 and 6. These particular 
errors were never made by any one of the 
other 26 Ss who participated in the two 
experiments. The remaining five selection er- 
rors were made with the one-switch configu- 
ration. Only one failure to detect a “false” 
indicator signal was observed in the 288 trials 
in which “false” signals occurred. 


Experiment III 


Experiment III was primarily concerned 
with the effect, on response speed, of the ori- 
entation of a single bank of switches (either 
row or column) for which the mean reach 
distance was approximately equal. Of addi- 
tional interest was the effect of labeling 
switches with numbers and letters. 

Response time. An analysis of variance for 
total response time, from the display of a 
symbol to the operation of the designated 
switch, is shown in Table 4. 

Both switch orientation and type of symbol 
showed significant effects, although the differ- 
ences associated with these variables were 
very small. Mean response times were 0.78 
seconds for rows versus 0.82 seconds for col- 
umns, and 0.77 seconds for numbers versus 
0.83 seconds for letters. Deininger (1960) 
also found a similar, though not significant, 
difference favoring rows, in an earlier study. 

The significant symbol effect suggests that 
the particular labels, used on the switches in 
Experiment I, did in fact contribute to the 
difference in selection times that was observed 
in that experiment. The symbol effect may 
not, however, be of general importance. It 
could be merely a consequence of the char- 
acteristics of the particular symbols used to 
identify the switches, or it may reflect the 
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TABLE 4 


ANALYSIS OF TIME SCORES FOR EXPERIMENT III 














Source df MS F 
Rows (subjects and order) 3 371.56 98.04** 
Days O02 Lane fi 
Columns (trials) Se LOnsOMeS: 00 
Orientation (rows versus 1 83:2 eat ons 
columns) 
Symbols (numbers versus 19223220 R O02 1s 
letters) 
Sets 3 1.42 — 
Subjects X Days 6 4.88 1.29 
Days X Trials 6 4.49 1.18 
Days X Orientation 2 3.59 _ 
Days X Symbols 2 1.62 — 
Days X Sets 6 3.41 — 
Orientation X Symbols 1 14.04 3.70 
Days X Orientation X Symbols 2 56 — 
Residual (error) 9 3.79 
Total 47 
*pi< 05 
PAKS AN 


relative legibility of the numbers and letters 
that appeared on the oscilloscope. 

Separate analyses performed on the two 
components of the total response showed the 
same significant effects for both symbols and 
switch orientation. The first of these two 
components included the time from the ap- 
pearance of the displayed symbol until S$ first 
moved his hand from the key switch located 
below the matrix. The second measured com- 
ponent included the travel time from the key 
switch to the selected switch, plus the manip- 
ulation time required to depress the selected 
switch, 


SUMMARY AND CONCLUSIONS 


Three experiments concerned with the op- 
eration of switch matrices are described. In 
the first experiment four different switch- 
matrix configurations were compared with 
regard to their effect on the speed and ac- 
curacy of switch-selection operations. Each 
configuration required two successive switch 
selections, one selection being made from a 
row of switches and the other from a column 
of switches. One half of the experimental Ss 
made their selections in a left-right sequence 
while the remaining Ss made their selections 
in a right-left sequence. Analysis of the first 
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experiment showed a significant interaction 
between Configuration and selection Sequence. 
This interaction was attributed, in part, to 
the spatial orientation of the group of switches 
from which the first selection response was 
made. The shortest response times were as- 
sociated with those Configurations and Se- 
quences in which the first selection response 
was made from a row, rather than a column, 
of switches. The numerical labels used with 
the rows of switches may also have contrib- 
uted to the observed differences. Reach dis- 
tance also had a significant effect on response 
speed. The average error rate for the selec- 
tion process was approximately two errors 
per 1,000 switch selections. 

In the second experiment, the “best” con- 
figuration identified in Experiment I was 
compared with a new configuration in which 
a selection was made by pushing a single 
button that represented both a row and a 
column of the matrix. Under these conditions 
the selection process was completed signifi- 
cantly faster when only the single selection 
response was required, but the time advan- 
tage was relatively small. 

The third experiment involved the selection 
and operation of a single switch in either a 
row or a column of eight switches, so ar- 
ranged that the mean reach distance was 
approximately equal for the two configura- 
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tions. Both letters and numbers were used to 
identify the switches on different experi- 
mental trials. Selection times were slightly 
shorter for the row configuration and for the 
numerical symbols. 

In a practical application, the choice of the 
most appropriate switch-matrix configuration 
will depend upon specific requirements. Where 
high speed and/or frequently repeated switch 
selections are required, a one-switch selection 
matrix will permit the fastest operation. 
Where speed is not critical and selection fre- 
quency is low, a two-switch selection matrix 
will be satisfactory. Furthermore, the design 
of the two-switch configuration will be less 
complex and possibly more reliable. 
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INFORMATION ASSIMILATION FROM UPDATED 
ALPHA-NUMERIC DISPLAYS 


CHARLES H. HAMMER ann SEYMOUR RINGEL 1 
United States Army Personnel Research Office, Washington, D.C. 


The accuracy with which Ss could locate updated elements of information was 
studied as a function of use of coded vs. uncoded updates, number of elements 
of information presented and number of elements of information updated. 
Selected findings demonstrate the value of coding as an information enhance- 
ment technique and the considerable effects of elements presented and updated. 
With uncoded displays a reduction in the percentage of responses as the number 
of updates increased may reflect a lessening of Ss’ confidence in their ability 
to make correct responses even though their actual performance did not appear 


to suffer. 


In command information processing sys- 
tems being considered for development, mili- 
tary information will be presented to com- 
manders and their staffs to aid in the making 
of tactical decisions. As a preliminary to the 
decision process, information must be rapidly 
and accurately assimilated from displays. 

The Command Systems Task of the United 
States Army Personnel Research Office has 
initiated a series of studies designed to pro- 
vide information on the optimization of hu- 
man performance in command information 
processing systems. Studies have been con- 
cerned with the assimilation of alpha-nu- 
meric information from charts and the assimi- 
lation of symbolic information from maps and 
overlays. 

In a recently completed study on the as- 
similation of alpha-numeric information 
(Hammer & Ringel, 1964), subjects (Ss) 
were required to locate updated information 
by comparing updated charts with hard-copy 
“history.” The updates were size-coded in 
half of the charts presented and uncoded in 
the remaining half. While amount of time 
taken to locate updates was the principal 
dependent variable, the study yielded the fol- 
lowing findings concerning errors: (1) Two 
types of errors were found, errors of omission 
and errors of commission and these errors 
occurred in the ratio of 3:1, respectively; 
(2) errors were reduced by half when 
coded updates were used. The low frequency 


1 Opinions expressed in this paper are those of 
the authors and do not necessarily reflect official 
Department of Army policy. 


of errors, however, precluded a more detailed 
analysis of the data. It was felt that a more 
stressful experimental task might have yielded 
more information. Accordingly the present ex- 
periment was designed to provide this infor- 
mation, 


MetHop 


The independent variables used were coded versus 
uncoded updated elements, elements of information 
presented, and elements of information updated (an 
element was defined as that word or number which 
appeared in a given row or column of a stimulus 
chart). Four levels each of elements presented and 
elements updated were used: 36, 54, 72, and 90, and 
4, 8, 12, and 16, respectively. Chart format and 
content used in the experiment were adapted from 
charts contemplated for use in an automated infor- 
mation processing system. Figure 1 is an example of 
a coded stimulus chart with a total of 90 elements 
of which 16 are updated coded elements. For each 
coded stimulus chart there was an uncoded stimulus 
chart identical in content. All stimulus charts were 
reproduced as 35mm _ negative slides. Figure 2 
shows_a hard-copy “history” answer chart corre- 
sponding to the stimulus chart in Figure 1. All an- 
swer charts were uncoded. The experimental task 
was as follows: S studied an uncoded chart of hard- 
copy “history” placed before him. After 1 minute the 
history chart was removed and an updated chart 
was shown on the screen. In the updated chart 
varying numbers of entries were different from the 
corresponding entries in the hard-copy history. After 
1 minute the updated chart was removed from the 
screen. The S then reviewed the history and was 
allowed 1 minute to locate and cross out the entries 
which had been updated. 

Accuracy and error scores were obtained for each 
trial. Two error scores were obtained, one for errors 
of omission and one for errors of commission. An 
error of omission was defined as a failure to locate 
an update. An error of commission was defined as 
the inaccurate location of an update. Percentages 


383 


384 
























































CHARLES H. HAMMER AND SEYMOUR RINGEL 
FRIENDLY TACTICAL UNITS STATUS FRIENDLY TACTICAL UNITS STATUS 
EFF ARMOR | 

ACTIVITY STRENGTH TERRAIN STATUS WEATHER UNIT ae ACTIVITY TERRAIN 
LANDING FARMLAND 92 23 SUPPLYING FARMLAND 
REBUILDING LOWLAND 85 72 REBUILDING LOWLAND 
ASSEMBLING RIVERS 91 57 ASSEMBLING RIVERS 
WITHDRAWING MEADOWLAND 82 | HUMID 82 WITHDRAWING MEADOWLAND 
FLANKING MARSHLAND 76 | RAIN 34 FLANKING MAR SHLAND 
SUPPORTING DESERT 96 | HURRICANE 13 SUPPORTING DESERT 
SURROUNDING FLATLAND 86 | sunny 45 SURROUNDING MUDDY 
SCREENING SWAMP 87 | WINDY 99 SCREENING SWAMP 
REGROUPING JUNGLE 83 64 REGROUPING ROCKY 
PLANNING LAKES 89 24 PLANNING LAKES 
TRAINING VALLEY 80 FREEZING 28 TRAINING VALLEY FREEZING 
HOLDING CLIFFS 78 STORM 56 HOLDING PLATEAU CooL 
PENETRATING FOREST 75 {Hor 18 PENETRATING FOREST HOT 
ASSAULTING HILLS 90 | Foc 53 ADVANCING HILLS FOG 
DEFENDING MOUNTAINS 94 CLEAR DEFENDING MOUNTAINS CLEAR 





Fic. 1. Example of coded updated alpha-numeric 


information. 


for accuracy and errors were obtained according to 
the following formulae: 


R 
————————— 1 
4 mae ORC ane [1] 
O 
a 2 
Eo RED ae oe [2] 
G 
SS 3 
E, RLoLes iil 


where: A=percent accuracy, #,=percent omits, 
E. = percent commits, R = number correct responses, 
O=number omits, C=number commits. 

The Ss for this experiment consisted of 30 en- 
listed men of above average intelligence as indicated 
by their scores for the General Technical Aptitude 
Area (110 or higher) on the Army Classification 
Battery. They were randomly assigned to two 
equal-sized groups, one which performed with coded 
updates and the other which performed with un- 
coded updates. Each group took all 16 treatment 
combinations of elements presented and elements 
updated. For purpose of administration each group 
was divided into three equal-sized subgroups. Order 
of administration of trials was randomized for each 
subgroup. 


RESULTS 


Analyses of variance of percentages were 
computed for accuracy, omits, and commits, 
and because of possible skewness of the data, 
for percentages transformed to log scores. 
Results of the analyses were essentially the 
same and showed all main effects to be sig- 
nificant at the .01 or .05 level with one nota- 
ble exception,? and indicated that interpreta- 


2The effect on accuracy of number of uncoded 
elements updated was not significant with untrans- 
formed scores but was significant with transformed 
scores. 









Fic. 2. Example of hard-copy history answer chart. 


tion could be based on the untransformed 
data. Summaries of the analyses and results 
of significance tests on untransformed data 
are shown in Tables 1, 2, and 3. Figures 3, 4, 
5, and 6 show accuracy and error perform- 
ance across levels of the independent varia- 
bles. 

While the Elements Presented < Elements 
Updated interaction was significant for all of 
the analyses, the Fs were not large and in- 
spection of profiles of means for this inter- 
action, with the exception of those for per- 
cent accuracy with uncoded updates, did not 
reveal any meaningful trends. A closer exami- 
nation of the means for percent accuracy with 
uncoded updates for this particular interac- 
tion showed somewhat puzzling results. Per- 
formance accuracy declined or remained sta- 
ble for 36, 54, and 72 elements presented as 
the number of uncoded updates increased 
from 4 to 16. However, at the 90 level per- 
formance accuracy actually increased from 4 
to 17% (Table 4). A reexamination of the 
hard-copy history answer charts and stimulus 
materials did not reveal any scoring errors or 
artifacts which could have produced this un- 
expected result. Consequently, the interpreta- 
tion of the effect of increasing the numbers 
of uncoded updates on accuracy is somewhat 
clouded. 

Since the results of an analysis of variance 
combining coded with uncoded displays were 
difficult to interpret because of heterogeneity 
of variance, differences between performance 
under the coded and uncoded conditions were 
tested for significance by a test which does 
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TABLE 1 
SUMMARY OF ANALYSES OF VARIANCE OF PERCENT ACCURACY 
Coded Uncoded 
Source of variation N 
MS Fa MS F 
Between subjects 2609.30 904.82 
Within subjects ° 
Elements presented (P) 3 7078.99 1:1-32** 10048.98 24,44** 
Subjects (S) X P 42 625.44 411.16 
Elements updated (U) 3 11610.15 13.46** 262.18 1.00 
SXU 42 862.39 243.20 
PXU 9 1334.05 255% 785.78 AlOis* 
SX PX. U 126 521.62 196.13 
adf = 14/225. 
*p < .05. 
ep < .01. 
TABLE 2 
SUMMARY OF ANALYSES OF VARIANCE OF PERCENTAGES OF OmiITS 
Coded Uncoded 
Source of variation N —— 
MS Fa MS F 
Between subjects 1116.61 3414.38 
Within subjects 
Elements presented (P) 3 4137.40 133335" 4822.13 Tooke 
Subjects (S) XP 42 310.87 638.98 
Elements updated (U) 3 8894.47 30.85** 6751.98 14.:32** 
Soe Ui 42 295.64 471.41 
Pex 9 834.01 Sees 713.33 2.04* 
aE xX Ui 126 237.49 349.51 
a df = 14/225. 
*p <.05. 
Kb < 01. 
TABLE 3 


SuMMARY OF ANALYSES OF VARIANCE OF PERCENTAGES OF COMMITS 














Coded Uncoded 
Source of variation N 
MS Fa MS F 
Between subjects 517.11 2270.58 
Within subjects 
Elements presented (P) 3 769.66 3.60* 936.57 5.98** 
Subjects (S) * P 42 221.55 156.73 
Elements updated (U) 3 1509.68 6.037" 4103.42 19.83** 
SX U 42 250.44 206.86 
Pex U, 9 328.33 1.97* 356.08 2128 
Be tax U 126 166.93 168.18 


a df = 14/225. 
*p <.05. 
* Dp <.01. 
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Fic. 3. Effects of coding and elements presented on 
accuracy. 


not assume homogeneity of variance (Dixon 
& Massey, 1957, pp. 123-124). All differences 
were significant beyond the .05 level. With 
the same test, differences between mean per- 
centages of omits and commits for coded and 
uncoded updates were also found to be sig- 
nificant beyond the .05 level. The results of 
this study may be summarized as follows: 

As the number of elements of information 
increased from 36 to 90: (a) accuracy de- 
clined from 86 to 62% for coded updates and 
from 38 to 11% for uncoded updates (Fig- 
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Fic. 4. Effects of coding and elements presented on 
types of errors. 


CHARLES H. HAMMER AND SEYMOUR RINGEL 


100 
80 
> 
= 
[- 4 
S 60 
<= 
s CODED 
= 
iS 
= 40 
a 
20 UNCODED 
ale 
0 4 8 12 16 


ELEMENTS UPDATED 


Fic. 5. Effects of coding and elements updated on 
accuracy. 


ure 3); (0) errors of omission increased from 
7 to 24% for coded updates and from 49 to 
67% for uncoded updates (Figure 4); (c) 
errors of commission increased from 8 to 14% 
for coded updates and from 13 to 22% for 
uncoded updates (Figure 4). 

As the number of elements of information 
updated increased from 4 to 16: (@) accuracy 
declined from 88 to 56% for coded updates 
but remained approximately stable at 22% 
for uncoded updates (Figures 5)—this ap- 
parent stability of accuracy scores for un- 
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Fic. 6. Effects of coding and elements updated on 
types of errors. 
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coded updates may be somewhat suspect in 
view of the nature of the profile of scores for 
the Elements Presented x Elements Updated 
interaction which was discussed previously; 
(6) errors of omission increased from 5 to 
28% for coded updates and 47 to 67% for 
uncoded updates (Figure 6); (c) errors of 
commission increased from 6 to 16% for 
coded updates and decreased from 27 to 11% 
for uncoded updates (Figure 6). 

The following differences were found as a 
function of the use of coded versus uncoded 
updates, respectively: (a@) performance ac- 
curacy—73% versus 24%; (6) errors of 
omission—16% versus 58%; (c) errors of 
commission—11% versus 18%. 

The percentages of omits were signifi- 
cantly larger than the percentages of commits 
for both coded and uncoded updates. 


DISCUSSION 


The enhancement of performance accuracy 
as a function of coding updated information 
supports previous findings for the inclusion of 
a coding capability in proposed information 
processing systems (Hammer & Ringel, 1964). 
For information processing tasks similar to 
that used in the current study the particular 
coding technique used (size-coding) is feasi- 
ble from a standpoint of systems hardware 
capability, and may be as effective as other 
techniques which are more difficult and ex- 
pensive to install. 

The effects on performance of the elements 
presented and elements updated variables sug- 
gest that unless new techniques for presenting 
and assimilating alpha-numeric information 
are developed, limits may need to be set on 
amounts of information presented and up- 
dated for any one chart, graph, figure, over- 
lay, etc. Such limits might create storage 
problems, particularly in those systems which 
carry the information on slides. 

The preponderance of errors of omission 
over errors of commission obtained in this as 
well as in previous studies of both alpha- 
numeric and symbolic information (Hammer 
& Ringel, 1964; Ringel & Vicino, 1964) sug- 
gests a need to determine their respective and 
combined impacts on information processing 
efficiency, particularly with respect to decision 
making in an operational setting. 
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TABLE 4 
PrErcENT ACcuRACY—UncopED DispLAys 
Elements updated 
M 
4 8 12 16 
Elements presented 
36 46.01 | 33.85 | 41.22 | 31.96 | 38.64 
54 26.20 | 25.42 | 27.87 | 25.65 | 26.79 
2, 29.21 | 22.25 | 13.72 | 18.29 | 20.63 
90 3.96 | 11.41 | 14.07 | 16.54 | 11.01 
M 26.08 | 23.97 | 24.71 | 23.35 | 24.77 











The data for 9 of the 15 Ss who worked 
with uncoded displays indicate that as the 
numbers of uncoded updates were increased 
from 4 to 16 the combined proportions for 
accuracy and commits, which represent total 
overt responses, decreased while proportions 
of omits increased. Coupled with the apparent 
stability in performance accuracy (assuming 
that the significant Elements Presented 
Elements Updated interaction did not mask 
a significant main effect) these findings may 
reflect an unrealistic increase in the perceived 
difficulty of the experimental task which may, 
in turn, relate to a reluctance on the part of 
Ss to risk making errors. A finding which may 
support this conjecture emerged from a study 
on assimilation of symbolic information (An- 
drews & Ringel, 1964) in which confidence 
was actually measured rather than inferred. 
The findings of both studies point out a need 
for research on the relationships among per- 
ceived difficulty, confidence, and performance, 
and their implication for decision making. 
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A COMPARATIVE STUDY OF CONVENTIONAL INSTRUCTION 
AND INDIVIDUAL PROGRAMED INSTRUCTION IN 
THE COLLEGE CLASSROOM 


JAMES R. RAWLS, OLIVER PERRY, anp EDWIN O. TIMMONS 


Louisiana State University 


The traditional college classroom teaching method of lecture and assigned 
readings was compared with an individual programed instructional method 
utilizing a programed text. Ss, 21 pairs, matched with regard to sex, age, 
intelligence test score, and hours of formal training in the biological sciences, 
were Ist tested upon completion of the physiological portion of an introduc- 
tory psychology course. They were then retested 6 wk. later. No significant 
differences were found in performance on Test 1. However, the level of per- 
formance on Test 2 was significantly higher for the program-instructed group. 


In light of the ever-increasing number of 
students enrolling in our schools and colleges, 
the overcrowded conditions already existent 
in our classrooms, and the rapidly increasing 
shortage of capable instructors, considerable 
interest has been recently directed toward 
programed instructional methods as one possi- 
ble means for coping with the situation. 
Lumsdaine and Glaser (1960), Hughes and 
McNamara (1961), Roe (1962), McNeil 
(1964), Goldberg, Dawson, and Barrett 
(1964), and Welsh, Antoinetti, and Thayer 
(1965), among others, have reported studies 
comparing conventional methods and_pro- 
gramed instruction in school and industry, 
with most encouraging results. 

The purpose of the present study was to 
compare the traditional college classroom 
teaching method of lecture and assigned read- 
ings with an individual programed instruc- 
tional method utilizing a programed textbook. 
The subject matter used in comparing the 
two methods was the physiological portion of 
an introductory psychology course. The cri- 
terion measure employed was performance on 
an examination over the material covered by 
both methods. 


METHOD 
Subjects 


Twenty-one pairs (20 male and 22 female Ss), 
matched with regard to sex, age, intelligence test 
scores, and course work in biology, were selected 
from 85 students enrolled in an introductory psy- 
chology course. 


Procedure 


Initially, all 85 members of the class were admin- 
istered the Thorndike-Gallup Vocabulary Test, a 
short, group-administered estimate of intelligence. 
Every student also filled out a questionnaire con- 
cerning his previous college and high school courses 
in the biological sciences. 

It was possible to closely match 21 pairs from 
the potential 42 pairs in the entire class. Members 
of a given pair were of the same sex, had identical 
scores on the Thorndike-Gallup, and differed in 
amount of former training in biology by no more 
than two semester hours. The average age difference 
between pairs was 0.43 years. Analysis by ¢ tests 
indicated that the groups did not differ significantly 
on any of the matching variables. 

Fourteen of the 21 pairs were selected at random 
and divided into experimental (E) and control (C) 
groups by flipping a coin. The member selected by 
chance to be in the E group did not attend the five 
class meetings during the next week, but studied a 
programed text, ad lib, in a supervised study hall. 
The remaining member of the pair was assigned to 
the C group and continued to attend the classroom 
sessions for the week of the experiment. The remain- 
ing 7 matched pairs, designated as the control- 
control group (CC), were not informed that they 
would be part of the experiment. Furthermore, they 
were not told that they were-matched or that their 
performance would be singled out from the remainder 
of the class in any way. The purpose for including 
the CC group was to attempt to assess the effects on 
the C group of knowing that they were Ss in an 
experiment—an experiment with a measure of com- 
petition inherent in the design. In short, it was a 
control for the “Hawthorne effect.” 

The C and CC groups attended the regularly 
scheduled class for 1 week or five 50-minute class 
periods. Lectures and assigned readings (41 pages) 
over the physiological unit covered the neuron and 
nerve impulse, structure and function of the brain 
and nervous systems, and the endocrine glands. 
Considerable effort was made toward making the 
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CONVENTIONAL AND PROGRAMED INSTRUCTION 


content of lecture and readings identical to that in 
the programed text. 

The E group did not attend class; instead, they 
attended a supervised study hall for a period of 1 
week during which they covered the same subject 
matter in the 238-page programed text, Biological 
Basis of Behavior (McGuigan, 1963). A careful 
check was kept on the amount of time each S in 
the E group spent in the study hall. Also, upon 
completion of the program, the study hall supervisor 
questioned each S and recorded his opinion of the 
programed text as an instructional method. 

The criterion measure—a 25-item, 4-option, multi- 
ple-choice examination—was jointly constructed by 
the classroom instructor and the two experimenters 
(Es) who supervised the E group. The professor 
teaching the introductory course submitted 37 items 
over the material covered in his lectures and as- 
signed readings. The Es selected 15 of these items 
that were covered specifically in the programed 
text. The Es submitted 33 items that were covered 
in the programed text from which the classroom 
professor selected 17 items that were specifically 
mentioned in readings or lecture. Considering dupli- 
cations and superiority of some items over others, 
25 items were selected from the 32 to make up the 
exam. Since each of these 25 items was covered 
both in lecture or assigned readings and in the 
programed text, and since each met the approval of 
both the classroom instructor and the other Es, the 
resulting test was accepted as an adequate criterion 
measure. 

Test 1 was administered to the entire class upon 
completion of the physiological unit. Since the class 
customarily had weekly quizzes, the initial test was 
anticipated even before it was announced. However, 
the same test (Test 2) was administered 6 weeks 
later without announcement and, consequently, 
without the students expecting it. 


RESULTS 


A t-test comparison between scores of the 
three groups on Test 1 showed no significant 
differences in level of performance for any 











TABLE 1 
MEAN SCORE FOR THE THREE Groups ON TEsT 1 

AND TEstT 2 

Test 1 Test 2 
Group | WV 

M SD M SD 
1g 14 22.07 aS 20.85 2e2o 
Cc 14 21.07 2.43 19.23 2.62 
CCs 14 20.71 2.42 18.57 3.41 
cc 7 20.48 2B? 18.14 2.39 
cc it 20.94 2.54 19.00 4.04 





Note.—Maximum = 25. 
"cc +ee = CC. 
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TABLE 2 
INTRAGROUP MEAN ComPaRISON BETWEEN TEstT 1 
AND TEst 2 
Group MD t 
E 1.46 233% 
G 1.96 267 
CC 2.54 2.95** 
*p <.05 
FED < .02 


group. Table 1 presents the mean score and 
standard deviation for each group on both 
administrations of the quiz. 

The differences between the means for the 
subpairs (cc, cc) in the CC group were not 
significant on either Test 1 or Test 2. 

A comparison of the performance of each 
group on Test 1 versus its performance on 
Test 2 shows a significant drop in mean 
scores for all three groups. Table 2 gives the 
mean differences and significance levels for 
each. 

Table 3 summarizes the direct comparison 
of each group versus the other two in terms of 
performance on Test 2. Retest scores were 
significantly higher for the E group than for 
both the C and CC groups (p < .01 and O01, 
respectively). The difference between means 
of the C versus CC groups on the second quiz 
failed to reach significance. 

Rearrangement of Ss within the E and C 
groups enabled a further, more detailed analy- 
sis of the data. Analyzed by two separate 2 x » 
2 classification analyses of variance, learning 
under the two instructional methods (esti- 
mated by performance on Test 1) was not 
differentially affected by either the sex of the 
S or high-low intelligence scores. Analyzed by 
three separate 2 X 2 classification analyses of 


TABLE 3 
INTERGROUP MEAN COMPARISONS ON TEsT 2 











Group MD t 
ee EE OE ENE. SE 
E—C 1.62 3.24* 

Rae 2.28 4.50** 
‘Oe .66 — 
¥Dix,01, 
Dp < .001. 
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variance, retention of the material learned 
under the two instructional methods (esti- 
mated by performance on Test 2) was not 
differentially affected by sex, high-low intelli- 
gence test scores, or high-low scores on Test 
1. Further, three additional analyses indicated 
that there was no differential effect upon the 
proportion of material retained (estimated 
by Test 2/Test 1) under the two methods as 
a function of sex, high-low intelligence test 
scores, or high-low scores on Test 1. 

As shown previously by ¢ tests, as a func- 
tion of instructional method, the E group 
scored significantly higher on Test 2 (p< 
01) than did the C group. With this excep- 
tion, all other main effects and interactions 
of the eight analyses just mentioned failed to 
reach significance. 

Records kept on the E group by study hall 
supervisors showed the time spent in covering 
the program to range from 3 hours, 52 min- 
utes to 6 hours, 47 minutes, with an average 
time of 4 hours, 25 minutes. An attempt was 
also made to assess S opinion of the pro- 
gramed text. All members of the E group made 
positive statements about the program, and 
all but four Ss stated a definite preference for 
learning this type of material by means of 
programed instruction. 


DISCUSSION 


The results of the present study showed 
no significant differences between the E group 
and the other two groups in learning (score 
on Test 1), but did show retention 6 weeks 
later (score on Test 2), to be significantly 
greater for those studying the programed text. 
These results tend to corroborate the findings 
of earlier investigators (Lumsdaine & Glaser, 
1960; Hughes & McNamara, 1961; Roe, 
1962; McNeil, 1964; Goldberg et al., 1964; 
Welsh, Antoinetti, & Thayer, 1965) that 
individuals utilizing programed instructional 
methods learn as well or better than those 
using conventional procedures. Several of 
these studies also show lower variance on the 
criterion test for the program-instructed 
groups. In the present study the variance on 
Test 1 was definitely smaller for the E group 
than the other groups. On Test 2 the variance 
was even more restricted than on Test 1 
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for the E group, whereas, for the other two 
groups, the variance was larger on Test 2. 

In contradiction to Goldberg et al. (1964), 
the present study showed individuals study- 
ing programed texts to retain significantly 
more, that is, score higher on Test 2, than 
those using conventional methods. However, 
this difference could be due to any one or a 
combination of the following: (@) the present 
study measured retention after 6 weeks, 
whereas Goldberg et al. tested retention after 
6 months; (0) the present study measured 
retention by means of recognition, viz, a 
multiple-choice test, rather than recall; and 
(c) the subject matter was entirely different 
for the two studies, that is, physiology as 
opposed to statistics. 

From the analysis of variance data on 
Test 1, it appears that there is no differential 
effect upon the learning of Ss under the two 
instructional methods either as a function of 
sex or high-low intelligence test scores. 

Close scrutiny of the data revealed that, 
within groups, individuals retained their ap- 
proximate rank on Test 1 and Test 2, even 
though the C and CC groups’ retest scores 
were significantly lower than those of the E 
group. Thus, it is more readily understood 
why the analyses of variance showed that sex, 
intelligence test scores, and scores on Test 1 
did not differentially affect either retention 
(score on Test 2) or the proportion of ma- 
terial retained (Test 2/Test 1) under the two 
instructional methods. 

In agreement with the previous findings of 
Hughes and McNamara (1961), Welsh et al. 
(1965), and Goldberg et al. (1964), the pro- 
gram group (E) showed a substantial time- 
saving. Conservatively estimating the time 
required for assigned readings at 2 hours and 
considering the 5 hours of lecture, the C 
and CC groups invested a total of 7 hours 
in covering the material. The same informa- 
tion was covered by the E group in an aver- 
age of 4 hours, 25 minutes—a savings of 
37%. Not a single S in the E group 
spent as much as 7 hours to complete the 
programed text. 

In addition to the advantage of programed 
methods already mentioned, both the savings 
in instructional cost and the fact that pro- 
gramed texts can go where the need for train- 
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ing exists, rather than necessitating transpor- 
tation expenses for the trainee, should more 
than warrant their future use in school and 
industry. 
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RELATIONSHIPS AMONG LEADERSHIP DIMENSIONS 


AND COGNITIVE STYLE 


P. WEISSENBERG anv L. W. GRUENFELD 


New York State School of Industrial and Labor Relations, Cornell University 


Witkin’s differentiation hypothesis served as a basis for the investigation of 3 
propositions: (a) field-dependent supervisors will show the highest “Esteem 
for the Least Preferred Co-worker” (LPC), (b) field-dependent supervisors 
will be more “considerate” (C), and (c) field-independent supervisors will be 
more “structure” (S) oriented. Witkin’s Embedded Figures Test (EFT), 
Fiedler’s Esteem for the Least Preferred Co-worker (LPC) instrument, and 
Fleishman’s Leadership Opinion Questionnaire (LOQ) were administered to 
73 civil service supervisors. The results established the existence of significant 
curvilinear relationships between EFT and LPC (p< .03), and between EFT 
and Consideration (p< .02). Individuals who were intermediate between 
extreme field dependence and extreme field independence discriminated most 
sharply between their most and least preferred co-workers. These findings point 
the way toward further research into leadership behavior using hypotheses 
derived from developmental psychology. 


Leadership studies usually identify two 
dimensions of supervisory behavior. A review 
of the literature has revealed that Fleishman 


(1957, 1960) distinguished between super- 
visors’ “initiation of structure” (S) and 
“consideration” (C). Fiedler (1964) divided 


supervisors by the degree to which they dis- 
criminated between their least and most pre- 
ferred co-workers and then revealed that 
those supervisors who make sharp discrimina- 
tions are more likely to be task oriented. 
Such task-oriented supervisors appear to show 
a preference for structuring the experiences 
of their subordinates. Thus Fiedler’s discrimi- 
nating, task-oriented supervisors appear to be 
similar to Fleishman’s structure group. There 
seems to be (Morris & Fiedler, 1964) some 
overlap between Fleishman’s LOQ Measure 
and Fiedler’s measure of “Esteem for the 
Least Preferred Co-worker” (LPC) and both 
appear to overlap conceptually with dimen- 
sions of employee-oriented and task-oriented 
supervisors defined by Kahn and Katz 
(1962). 

Witkin’s work on dimensions of behavior 
(Witkin, Dyk, Faterson, Goodenough, & 
Karp, 1962) distinguished between field- 
dependent and field-independent persons. He 
found, in developmental work with children 
and young adults, that the field-independent 
person tends to impose structure on his en- 
vironment for he is more able to discriminate 


between a figure and its surrounding field as 
measured by the Embedded Figures Test 
(EFT). The field independent, in addition to 
his analytical perception of the environment, 
also shows greater concern with the mastery 
of his environment. Field-dependent persons, 
on the other hand, appear to be more compli- 
ant in social situations; they are relatively 
more sensitive to the values and opinions 
of others and they appear to have greater 
consideration, that is, empathy with others. 
Although Witkin found some subtle emotional 
differences within his field-independent group, 
the dimensions he identified overlap concep- 
tually with the dimensions of leadership style 
identified by Fiedler and. Fleishman. 

A review of Witkin’s work suggested that 
the field-independent person should be higher 
on the “structure” component of Fleishman’s 
scale and should be a “low LPC” type as 
measured by Fiedler’s instrument. It appeared 
likely that the field-dependent person would 
achieve a lower score on the structure scale 
and would be a high LPC type as defined by 
Fiedler. It- seemed that the field-dependent 
person would also score higher on the ‘“con- 
sideration” scale than the field-independent 
person. Our problem was to determine whether 
in fact these relationships would hold true if 
put to an empirical test. 

In order for us to perform such a test, the 
following hypotheses were formulated: 
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1. The field-dependent person will show 
the highest ‘Esteem for the Least Preferred 
Co-worker” (LPC) score. 

2. The field-dependent person will be the 
more “considerate” person, 

3. The field-independent person will be 
the more “structure’’-oriented person. 

4. In addition to an investigation of the 
three major hypotheses stated above, a sub- 
sidiary investigation was also carried out to 
determine whether the Fleishman LOQ in- 
strument does, in fact, measure the same 
dimensions as Fiedler’s LPC, 


METHODOLOGY 
Subjects 


The subjects (Ss) for this research were 73 male 
civil service supervisors from a state department of 
taxation and finance. They ranged in rank and pay 
from a G-18 classification to the highest grade in 
the department. The average employee was 51 years 
old and had been with the department 24 years, 
serving 11 of those years as a supervisor. 


Procedure 


The instrument used to measure the independent 
variable was the Embedded Figures Test (EFT), 
short-form, a timed performance test developed by 
Witkin et al. (1962). The short-form includes 12 
complex figures, each with a time limit of 3 minutes. 
The test is very reliable; test-retest correlations of 


TABLE 1 


MEANS AND STANDARD DeviaATIONs oF LPC ann LOQ 
Scores By Tricnotomizep EET Scores 





EFT 
Measure ee ee ——— ; rt 
High Medium Low 
LPC 
x 80.3 62.0 71.0 
SD 31.6 17.5 16.8 
N 23.0 24.0 23.0 
S 
* 4 43.2 43.2 41.5 
SD 8.3 7.8 7.5 
N 25.0 24.0 22.0 
Cc 
os 56.0 56.0 52.0 
SD 7.0 6.9 6.6 
N 25.0 24.0 22.0 





Note.—Variations in Ns are due to incomplete data on the 
part of 5 different subjects; 3 did not complete the LPC and 2 
did not complete the LOQ. 
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TABLE 2 
ANALYSiS OF VARIANCE EFT/LPC 
Source of 
variation df oO) MS F 
LPC 2 500.5 250.3 SOL” 
Within 67 4,776.1 71.3 
Total 69 5,276.6 


Note.—-Standard scores were used, 
"p) <.05, 


89, 90, .92, and .95 have been reported (Witkin 
et al., 1962). The test was administered in accordance 
with standard instructions (Witkin, 1950). 

The LPC measure used in this study contains 17 
items previously used by Fiedler (1964), Each item 
consists of a pair of adjectives describing the two 
opposite poles of a personality attribute. The scale 
is a modified version of the semantic differential 
(Osgood, Suci, & Tannenbaum, 1957). Each item is 
scored from most to least favorable on a_ scale 
from 1 to 8 points and a total score is computed by 
summing the responses for all items. This score is 
known as the “Esteem for the Least Preferred 
Co-worker” (LPC). 

The other paper and pencil instrument used was 
Fleishman’s (1957, 1960) Leadership Opinion Ques- 
tionnaire (LOQ)., This instrument was administered 
in accordance with standard instructions (Fleishman, 
1960). 


RESULTS 


A preliminary inspection of the data re- 
vealed that the relationships among the major 
variables of concern were curvilinear rather 
than linear. The only significant linear cor- 
relations observed were method - specific. 
Consequently the data were divided into three 
categories based on EFT score, low, 1-49 
seconds; intermediate, 50-85 seconds; and 
high, 86-180 seconds. Within-category means, 
shown in Table 1 below, were then calculated 
for C, S, and LPC scores. 

Inspection of Table 1 shows that the rela- 
tionship between LPC and EFT is U shaped, 
the relationship between C and EFT is curvi- 
linear, and both relationships are statistically 
significant by ANOV' (see Tables 2 and 3). 
A Newman-Keuls test (Winer, 1962) applied 


1A test for homogeneity of variance in the case 
of LPC data was negative and therefore the data 
were converted to standard scores with a mean of 
50 and a standard deviation of 10. The results of 
both analyses were identical, however. 
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TABLE 3 
ANALYSIS OF VARIANCE EF T/LOQ—C 
Source of 
variation df 9! MS Ff 
LOQ—C 2 312.0 156.0 eels 
Within 68 3,185.3 46.8 
Total 70 3,497.3 
*p <.05. 


to these data revealed that the difference be- 
tween the LPC mean of the high EFT group 
(80.3) and the medium EFT group (62.0) 
is statistically significant (p< .05). How- 
ever, the medium EFT group does not differ 
significantly from the low EFT group, nor 
does the low EFT group differ on the LPC 
measure from the high group. These data 
therefore indicate that persons intermediate 
between the extremes of field dependence and 
field independence discriminate most sharply 
between their least and most preferred co- 
workers. 

Furthermore, the results of the Newman- 
Keuls test on the differences among means 
of the C scores of the three EFT groups 
revealed that the differences between both the 
high and medium groups (both 56.0) and the 
low group (52.0) are statistically significant 
(p < .05), indicating that extremely field- 
independent persons are less _ considerate 
than intermediate or highly field-dependent 
persons. 

These findings provide partial support for 
the hypotheses; however, the relationships 
are apparently more complex than was 
anticipated, 

Differences among the three EFT groups 
due to S means were not significant. Nor did 
the expected relationships between scores on 
the LOQ and LPC materialize. 

In order to ascertain the magnitude of the 
curvilinear relationships between EFT/LPC 
and between EFT/C, a correlation ratio (eta) 
for each was computed (Guilford, 1956), 
with the following results: EFT/LPC = .87 
(p <.03); EFT/C= .90 (p< .02). 

The magnitude of these coefficients is very 
high, indicating that there are substantial 
regularities between field dependence-inde- 
pendence and LPC as well as C. 
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DIscussION 


Contrary to expectations the individuals 
who are intermediate between extreme field 
dependence and extreme field independence 
discriminated most sharply between their 
most and least preferred co-workers and 
therefore, according to Fiedler (1964), they 
would presumably be more task oriented than 
their peers at either extreme. The observa- 
tion that highly field-dependent individuals 
did not, relatively speaking, discriminate 
sharply between their most and least pre- 
ferred co-workers was in line with our hy- 
potheses. However, the fact that extremely 
field-independent persons failed to discrimi- 
nate sharply between their most and least 
preferred co-workers was surprising; in retro- 
spect, though, Witkin’s work provides an ex- 
planation: some extremely field-independent 
individuals appear to shun interpersonal in- 
volvement and may be poorly motivated to 
make distinctions among the performances of 
group members. They appear to avoid leader- 
ship behavior. Although all Ss were classified 
as supervisors by the civil service, the promo- 
tional process in the civil service is based 
primarily on performance on achievement 
tests rather than on effective leadership be- 
havior. Achievement tests are exclusively 
measures of task-relevant information. Thus 
it is possible for the highly field-independent 
supervisory group to be composed of indi- 
viduals who shun the interpersonal aspects of 
leadership behavior, and this may account for 
the results which were obtained. In any 
event, these conditions will require special 
consideration in future research. 

In accordance with our hypothesis it was 
found that the relatively field-independent 
person is less considerate than either the 
intermediate or extremely field-dependent 
person. The failure to demonstrate differences 
between the intermediate and the extremely 
field-dependent group may in part be due to 
the cutoff point which was arbitrarily chosen 
to classify these groups. Witkin has indicated 
that among the extremely field-dependent per- 
sons there are those who “fake” consideration 
although they lack genuine empathy for 
others. The LOQ questionnaire is highly 
transparent and susceptible to faking. As a 
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matter of fact, the content of the LOQ, and 
the comments and responses of individuals 
to it, do not generate too much confidence 
in its validity, unless it can be administered 
anonymously. Since in this study the EFT 
instrument had to be administered individu- 
ally, it was impossible to maintain anonymity 
under these conditions. 


CONCLUSION 


This study has shown that two of the more 
popular measures of leadership style are re- 
lated to field dependence-independence as 
measured by the EFT. The relationships, 
however, were curvilinear rather than linear. 
In addition, although LPC and LOQ were 
expected to overlap conceptually, an empirical 
relationship between these two measures could 
not be substantiated in this study. Further- 
more, the “initiation structure” scale of the 
LOQ was not related to field dependence- 
independence. 

Mixed findings regarding the relationship 
between LOQ and EFT are probably an arti- 
fact of the shortcomings of the LOQ measure. 
The LOQ instrument suffers from all the 
shortcomings that are usually associated with 
verbal pencil and paper tests. The statements 
which respondents on the LOQ measure are 
required to endorse or reject have a lot of 
surplus meaning. The “correct answers” are 
quite apparent depending on the stance that 
the respondent wants to take. 

The curvilinear relationship between LPC 
and EFT, two measures which are much less 
likely to be affected by social desirability, 
suggests several interesting avenues for future 
research. If the behavioral consequences of 
field dependence-independence in leadership 
situations can be identified, it will be possible 
to link leadership studies to a considerable 
body of knowledge in developmental psychol- 
ogy. Since previous research, for example, has 
shown that field dependence is a relatively 
unmodifiable cognitive style (Elliott & Mc- 
Michael, 1963; Witkin et al., 1962; Wolf, 
1965) traditional training techniques may be 
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impotent in bringing about change. The EFT 
instrument may then be useful for selection 
and placement purposes. Moreover, in com- 
parison with other so-called personality meas- 
ures it does not suffer from the usual dif- 
ficulties that are encountered with paper and 
pencil self-report questionnaires. 


REFERENCES 


EvriottT, R., & McMicuarr, R. E. Effects of scien- 
tific training of frame dependence. Perceptual & 
Motor Skills, 1963, 17, 363-367. 

Frepter, F. E. A contingency model of leadership 
effectiveness. In L. Berkowitz (Ed.), Advances in 
experimental social psychology. Vol. 1. New York: 
Academic Press, 1964. Pp. 149-190. 

FrieisHMan, E. A. The Leadership Opinion Question- 
naire. In R. M. Stogdill & A. E. Coons (Eds.), 
Leader behavior: Its description and measurement. 
Columbus: The Bureau of Business Research, The 
Ohio State University, 1957. Pp. 120-130. 

FrietisHMan, E. A. Manual for Leadership Opinion 
Questionnaire. Chicago: Science Research Associ- 
ates, 1960, 

GuitrorpD, J. P. Fundamental statistics in psychology 
and education. (3rd ed.) New York: McGraw- 
Hill, 1956. 

Kann, R. L., & Karz, D. Leadership practices in 
relation to productivity and morale, In D, Cart- 
wright & A. Zander (Eds.), Group dynamics. 
(2nd ed.) Evanston, Ill.: Row, Peterson, 1962. 
Pp. 554-570. 

Morris, C. G., & Frepier, F. E. Application of a 
new system of interaction analysis to the rela- 
tionships between leader attitudes and behavior in 
problem solving groups. Urbana: Department of 
Psychology, University of Illinois, 1964, No. 14. 
(Tech. Reprint) 

Oscoop, C. E., Sucr, G. J.. & Tannensaum, P. H. 
The measurement of meaning. Urbana: University. 
of Illinois Press, 1957. 

Winer, B. J. Statistical principles in experimental 
design. New York: McGraw-Hill, 1962. 

Wirxin, H. A. Individual differences in ease of per- 
ception of embedded figures. Journal of Personnel, 
1950, 19, 1-15. 

Wirxwy, H. A., Dyk, R. B., Fatrerson, H. F., Goop- 
ENOUGH, D. R., & Karp, S. A. Psychological dif- 
ferentiation: Studies of development. New York: 
Wiley, 1962. 

Wotr, A. Body rotation and the stability of field 
dependency. Journal of Psychology, 1965, 59, 211- 
Zuie 


(Received July 29, 1965) 


Journal of Applied Psychology 
1966, Vol. 50, No. 5, 396-399 


EFFECTS OF TUITION PAYMENT AND INVOLVEMENT 
ON BENEFIT FROM A MANAGEMENT 
DEVELOPMENT PROGRAM 


L. W. GRUENFELD 
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This study investigated the effects of participants’ investments of tuition, time, 
and effort on benefit from a management development program. Measures of 
benefit consisted of a summated rating scale and the economic scale of the 
AVSV. Ss were 99 industrial executives in a 5-yr program. Those who paid 
part of their tuition, spent relatively more time in the program, and found 
the program difficult, benefited more. It is concluded that programs which do 
not require commitment and effort are not likely to achieve their immediate 


value objectives. 


A review of research conducted to evaluate 
the effectiveness of a variety of management 
development programs reveals that several 
moderating variables intervene between expo- 
sure to such a program and changes in atti- 
tudes, skills, and knowledge. Individual dif- 
ferences in ability and personality, organiza- 
tional climate, and social support by superiors 
and peers account for considerable variance 
in effectiveness (House, 1963). This study 
was designed to evaluate the effect of partici- 
pants’ investments in a management develop- 
ment program on their perceived benefit from 
it. It is possible that the benefit from such a 
program is related to the magnitude of invest- 
ments made by the individual’s decision to 
participate. 

It is hypothesized that those individuals 
who pay part of the tuition for a program 
will be more concerned with their investment 
than those who are sent by an employer who 
pays all the tuition for them. One way to 
reduce this concern is to overvalue the experi- 
ence by emphasizing its positive aspects 
(Festinger & Aronson, 1960). 

Another cost of participation in an activity 
is the amount of effort required to complete 
it. Effort can be measured in terms of time 
or of difficulty. In either case the general 
hypothesis is that individuals are likely to 
value that for which they labor. 

An opportunity to test the above hypotheses 
availed itself in a series of studies designed 
to evaluate the effectiveness of a liberal arts 
institute for personal development, a 5-year 
program for business and industrial execu- 


tives fashioned after the ATT Institute for 
Humanistic Studies at the University of 
Pennsylvania, described by Viteles (1959). 

Participating organizations in the program 
were encouraged to ask the candidates whom 
they selected to make a personal contribution 
of $500 toward defraying the tuition expenses 
for this program. This suggestion was meant 
to insure commitment of participants to the 
program. Organizations varied in their will- 
ingness to ask their members to pay part of 
the tuition. Forty percent of a group of 99 
executives contributed $500 toward tuition 
payment; for the remainder the employer de- 
frayed total tuition expense. Participants at- 
tended sessions over five consecutive sum- 
mers, and those who paid did so in five yearly 
installments. 

The program under study was intellectually 
quite demanding. The capacity to benefit 
from it as measured by faculty ratings of 
participants correlated substantially (.64) 
with intelligence (Gruenfeld, 1961). There- 
fore, participants whose intelligence test scores 
are relatively low should have found the 
program more difficult. 


Description of the Program 


The program was organized in 1955 as a 
liberal arts program for executives in busi- 
ness and industry, in response to executives’ 
recommendations that the subject matter and 
educational processes of the liberal arts 
would be appropriate vehicles for the develop- 
ment of executives already proficient in tech- 
nical skills (Goldwin, 1957). Participants in 
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this program received personal assistance in 
planning a program of readings and other 
developmental activities designed to overcome 
personal deficiencies in administrative and 
interpersonal skills. Subject matter was 
selected from the physical sciences, social sci- 
ences, and the humanities. The program pre- 
sented ideas, assumptions, and dilemmas of 
intra- and interpersonal relationships as well 
as social and organizational conditions that 
influence behavior. In addition, each partici- 
pant took part in a personal assessment and 
developmental counseling program not unlike 
the procedures used by a number of consult- 
ing firms which specialize in executive de- 
velopment and assessment by means of clini- 
cal interviews. The counseling program was 
designed to help the participant assess his 
developmental needs and to plan and institute 
remedial activities to improve his adminis- 
trative and interpersonal effectiveness. Oral 
and written communications exercises were 
also provided to enhance skill and confidence 
in these activities. 


MeEtTHOD 
The Sample 


Three groups of participants were used: (1) a 
group of graduates (V = 13) who had completed the 
institute 1 year before the administration of the 
measuring instruments combined with a graduating 
group (N=19) who had completed the program 
at the time of measurement, (2) a fourth-year group 
(N=23) and a third-year group (N=17), and 
(3) a second-year group (V=13), and a first— 
year group (N=14) who had completed its first 
summer session at the time of measurement. 

Participants had varied backgrounds: ages ranged 
from 22 to 48 years with a median of 35; 24% had 
no college experience other than extension courses; 
17% had 1-2 years of college work; 59% had 
college degrees. Most were in middle management 
with a few each in lower and upper management. 
Sponsoring organizations represented manufacturing, 
banking and finance, insurance, public utilities, 
mining, and retailing. There were no statistically 
significant age differences among the groups. 

The attrition rate among participants was very 
low. At the time of this study four members had 
dropped out primarily because they had been trans- 
ferred or had terminated employment. The drop 
out was not systematically related to the tuition- 
payment variable. 


The Evaluation Questionnaire 


The evaluation questionnaire was a Likert-type, 
summated rating scale consisting of 71 items. Each 
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item represented a statement of an objective that 
could be obtained by this program and the respond- 
ents were asked to indicate the extent of related 
improvement or deterioration that could be attrib- 
uted to participation in this program. The instru- 
ment therefore constituted a measure of perceived 
benefit from the program. The estimated split-half 
reliability (odd-even) of the questionnaire was .97. 
The reliability of responses for a group of 60 
respondents over a 1-year period was .73. 

A principle component analysis of the total ques- 
tionnaire revealed the existence of a substantial gen- 
eral factor which prior to Quartimax rotation ac- 
counted for over 50% of the total variance of the 
questionnaire content. The factors which constituted 
the questionnaire were as follows: (a) open— 
mindedness, (b) interest in the content of the pro- 


gram, (c) communication skills, (d) interpersonal 
skills, and (e) satisfaction with counseling and 
programing. 


Analyses of the data using the five factors sepa- 
rately yielded the same results in each case that 
were obtained when the total questionnaire score 
was used. 


The Values Measure 


The Allport Vernon Study of Values (AVSV) 
was completed by 73% of the participants. Viteles 
(1959), in a similar management development pro- 
gram, has shown that economic values of partici- 
pants as measured by the AVSV declined. This study 
relates these findings to the investment hypotheses. 


RESULTS 


A preliminary analysis of the data showed 
no significant differences due to type of or 
level or function in organization which could 
differentiate those individuals who paid tui- 
tion from those for whom the employer paid 
the entire fee. In addition personality vari- 
ables, which in a previous study were found 
to be significantly related to benefit from 
this program, were unrelated to the tuition- 
commitment variable. Consequently, it was 
deemed legitimate to relate the tuition- 
payment variable directly to the effectiveness 
measures, 

Table 1 shows the means and standard 
deviations on the evaluation questionnaire for 
each of the three groups (by time), which 
are in turn divided by the pay versus no-pay 
variable. Table 2 shows the 2 X 3 factorial 
ANOV summary table for the perceived 
benefit questionnaire data. The respective F 
statistics for the pay and the time conditions 
are statistically significant. It can be noted 
by inspection of Table 1 that the mean per- 
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TABLE 1 


EVALUATION QUESTIONNAIRE MEANS FOR GROUPS IN 
WIPD By Time anp Pay versus No-Pay 


L. W. GRUENFELD 


TABLE 3 


AVSV—Economic ScatE MEANS FOR GROUPS IN 
WIPD By Time AnpD Pay versus No-Pay 








Graduates 
Vestal 1—and 2—- | 3-and 4 and i 
Year Group | Year Group | Graduating 
Groups 
Pay 
M 129 168 205 
SD 42.4 47.1 38.7 
N 7 19 13 
No-Pay 
M 118 139 159 
SD 45.1 54.8 36.8 
N 20 21 19 














ceived benefit score for each of the tuition 
payment groups is larger than the no-pay 
groups in all three cases, and that the magni- 
tude of the differences between tuition pay- 
ment and nonpayment groups increases line- 
arly with time spent in the program. The 
linear trend within pay by time and no-pay 
by time is significant beyond the .05 level 
for each (¢=3.4 and 2.6, respectively). 
Moreover, the magnitude of benefit over time 
for the pay groups is considerably larger than 
that of the no-pay groups. The results also 
show that there is no interaction between the 
time and the payment variables, thus allowing 
a clear interpretation of the findings: those 
individuals who defrayed part of the tuition 
for this program perceived significantly more 
benefit. In addition, those individuals who 
spent relatively more time in the program 
also perceived more benefit. 

Intelligence (Adaptability Test) and per- 
ceived benefit correlated —.41. This finding 
is based on one combined group (3 and 4) 
only; for the remaining groups intelligence 


TABLE 2 


ANOV Summary TasLeE—PeErRcEIveD BENEFIT DATA 











Source of variation Sy df MS Pobs 
Pay/No-Pay 18002 1 | 18002 Sue 
Time 49976 | 2 | 24988 Aiea 
Pay/No-Pay X Time SUL) |p 2 153 07 

Within cell 203695 | 93 2190 
*p <.01. 





Graduates 
Variabl l-and 2- | 3-and 4 and 
eee | Year Group | Year Group | Graduating 
Groups 
44.6 45.7 Bio) 
ae N=7 N=13 | N=1 
46.6 41.3 43.0 
NorPay N=16 | w=15 | W=10 





Note.—Pooled within group Vvariance = 7.7. 


measures were incomplete. Based on 39 de- 
grees of freedom this 7 statistic is significant 
at the .05 level and in the predicted direction, 
indicating that those participants who find the 
program more difficult perceive relatively 
more benefit than their more intelligent peers. 

The means and standard deviations on the 
economic scale of the AVSV by pay and time 
spent in the program are shown in Table 3. 
The 2 X 3 factorial ANOV did not yield any 
significant F ratios. However, inspection of 
Table 3 shows the trend that was expected 
for the group that completed the program. 
Apparently it takes some time for the pre- 
dicted effects to show on the values measure. 


DISCUSSION 


Investment of time and effort has increased 
perceived benefit from the program. Partici- 
pants who spent more time in the program 
reported more benefit from it; participants 
who had lower intelligence scores and who, 
therefore, found the program more difficult 
reported more benefit from it. Financial in- 
vestment in the program has resulted in 
increased benefit; for those individuals who 
paid part of their own tuition reported that 
they derived more benefit from the program. 
The data make it safe to assert that those 
individuals who were involved and committed 
to the program valued it more and were rela- 
tively more satisfied with it than those for 
whom the program was free and relatively 
effortless. From an institutional point of view 
these findings indicate that management de- 
velopment programs are more likely to reach 
their immediate value objectives if they exact 
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some cost and effort from the individuals 
participating in the program. 
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RELIABILITY OF PEACE CORPS SELECTION BOARDS: 


A STUDY OF INTERJUDGE AGREEMENT BEFORE 
AND AFTER BOARD DISCUSSIONS 


LEWIS R. GOLDBERG! 


University of Oregon and Oregon Research Institute 


The most crucial link in the Peace Corps selection process is the Advisory Selection 
Board, where a comprehensive pool of assessment data on each Peace Corps 
trainee is evaluated and discussed. In an effort to better understand this important 
process of collective clinical judgment, 9 Peace Corps Selection Boards were studied. 
Agreement between Board participants on the overall suitability of each Peace 
Corps trainee prior to Board discussion was compared with that reached after 
Board discussion. In general, the findings from 9 Selection Boards appeared re- 
markably similar, indicating that Board discussions tend to (a) decrease the suit- 
ability ratings for the average trainee, (b) increase the average dispersion of ratings 
for the group of trainees, and (c) increase quite dramatically the degree of con- 
sensus among Board participants. The unusually high consensus among participants 
after Board discussions attests to the rationality—though not necessarily the 


validity—of the Peace Corps selection process. 


Of all of the personnel selection procedures 
now in existence, one of the most compre- 
hensive is that utilized by the United States 
Peace Corps to select Peace Corps Volunteers. 
In an effort to be fair to each applicant and to 
minimize Volunteer failures overseas, the 
Peace Corps has instituted a 2-stage selection 
system. In the first stage, applicants are 
appraised on the basis of a detailed ques- 
tionnaire, from 6 to 16 references, and a set of 
aptitude test scores; promising applicants are 
invited to participate in a 2 to 4-month period 
of intensive Peace Corps training. 

During this training period, the second stage 
of selection takes place. Each Peace Corps 
trainee is evaluated by a host of people who 
have observed some aspect of his functioning; 
these viewpoints about each trainee stem from 
four major sources: (a) A Civil Service full- 
field background investigation gathers evalua- 
tions of the trainee’s past performance in his 
schooling, his work, and his recreational 
activities. (6) Training instructors provide 
information on the trainee’s participation and 
achievement in Peace Corps courses, as well 


1 The author wishes to express his appreciation to the 
Selection Board members who participated in this 
project, and to the United States Peace Corps for 
providing such an important laboratory for natural- 
istic research. This study was supported, in part, by 
Grant MH-04439 from the National Institute of Mental 
Health, United States Public Health Service. 


as furnishing their impressions of each trainee’s 
potentiality as a Peace Corps Volunteer. 
(c) Fellow trainees furnish extensive peer 
evaluations, pooled judgments providing a 
portrait of each trainee as his peers have viewed 
him during the intensive training period. 
And, (d) staff psychologists and psychiatrists 
assess each trainee on the basis of their own 
professional tools—psychological tests and 
clinical interviews. Peace Corps assessment 
provides an attempt to understand each Peace 
Corps trainee through an integration of these 
diverse views of his behavior and effective- 
ness. One or more PhD psychologists, called - 
Assessment Officers, serve at each training 
site. It is their responsibility to integrate 
the viewpoints of those who have observed 
the trainee’s behavior during the training 
period and to provide a unified picture of the 
trainee’s strengths and weaknesses for Peace 
Corps service overseas. 

In an effort to reach the wisest possible 
selection decisions, all of this rich and varied 
material is reviewed in two staff conferences, 
called Advisory Selection Boards. The first, 
or Intermediate Selection Board, takes place 
approximately halfway through training; the 
Final Selection Board is held at the end of the 
training period. The Project Director, the 
Assessment Officers and their psychiatric 
consultants, all participate on these Boards, 
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as do representatives from the Selection, 
Training, and Program-Development-and-Op- 
erations Divisions of the Peace Corps. These 
Boards serve much the same function as a 
presidential cabinet, seeking to advise the 
Peace Corps official—the Selection Officer— 
who has been entrusted with the final respon- 
sibility for making an objective and fair 
decision regarding the makeup of the Peace 
Corps group sent overseas. 

Certainly the most critical component of the 
entire Peace Corps selection process is the 
Advisory Selection Board, since it is at this 
point that selection decisions are made. Con- 
sequently, knowledge of the reliability of 
judgments made during these Boards is of 
crucial importance to the Peace Corps. 
Moreover, these Boards provide an important 
natural laboratory for the study of decision 
making in small group settings, and the find- 
ings from empirical studies of these Boards 
may be generalizable to other small groups 
which are less tractable to scientific study. 

The present report summarizes studies of 
nine Peace Corps Selection Boards, focusing 
on the degree of agreement between Board 
participants prior to, and after, Board discus- 
sion of each trainee. Consequently, this report 
provides evidence of the effects upon con- 
formity in judgments of small group discus- 
sions in a natural problem-solving situation. 


MertrHop 
Training Projects 


Vive different Peace Corps training projects at the 
Peace Corps Center of the University of Hawaii at 
Hilo were studied. The first project was composed of 
individuals training to go to Thailand, primarily as 
teachers of English. Pre- and postdiscussion ratings at 
an Intermediate Selection Board (Board A) held after 
6 weeks of training, and pre-Board, prediscussion, and 
postdiscussion ratings at a Final Board (Board C) were 
studied. The second project was composed of indi- 
viduals training for service in Malaysia; the group 
included nurses, secretaries, laboratory technicians, 
and three types of Rural Community Action workers 
(primarily individuals with agricultural backgrounds). 
Pre- and postdiscussion ratings at an Intermediate 
Board (Board B), and postdiscussion ratings at a Final 
Board (Board D) were studied. The third and fourth 
projects also were composed of individuals training to 
go to Thailand as English teachers. Postdiscussion 
ratings from a Final Board (Board E) of one project, 
and postdiscussion ratings from an Intermediate Board 
(Board F) and Final Board (Board H) of the other 
project, were studied. The fifth project was also bound 
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for Thailand, as village health and sanitation workers; 
postdiscussion ratings from an Intermediate Board 
(Board G) and a Final Board (Board I) were studied. 


Board Participants 


The composition of Peace Corps Selection Boards 
differs slightly, depending at least in part upon (a) 
the Selection Officer’s choice of persons to whom he 
wants to turn for advice, and (b) the availability of 
these individuals to meet together at a common time 
at the training site. The Selection Officer, who ulti- 
mately has the responsibility for Peace Corps selection 
decisions, is always present at these Boards. For all 
nine of the Boards under study, the same Selection 
Officer (the author) was involved. 

Five different Assessment Officers participated on 
one or more of these Boards. In addition, three Assess- 
ment Associates, graduate students in psychology 
taking time out from their studies to work full time 
for the assessment staff, participated on one or more of 
the Boards, as did two psychiatrists. The Project 
Director—the administrator of the training project— 
participated on all nine Boards, and two of his assistants 
were also involved on some of the Boards. Three 
Coordinators—members of the instructional staff 
having responsibility for supervising the technical 
training in the project—also participated on one or 
more Boards. From the Peace Corps Washington 
Staff, two Training Officers participated, each on a 
different Board, In addition, four Desk Officers from 
the Program-Development-and-Operations Division 
of the Peace Corps each participated on a Board. Three 
members of the Peace Corps overseas staff were also 
involved on one or more of the Boards. 


Rating Scale 


All Board participants rated each trainee on an 11- 
point rating scale, where “0” indicated that the trainee 
should be separated from the project at the time of the 
Board and “10” indicated an “ideal Peace Corps 
Volunteer.” Participants were free to use the inter-, 
mediate points on the scale as they wished. ; 


Procedure 


All Board participants were presented with a set of 
mimeographed material summarizing the data col- 
lected during training. Approximately two typewritten 
pages of information, summarizing the trainee’s grades, 
staff evaluations, peer ratings on 6-12 traits, psychiatric 
notes, and the assessment staf{’s summary evaluation, 
were presented for each trainee. In addition, many of 
the participants had some information not shared by 
others; for example, only the Selection Officer had read 
the background reports, while members of the ad- 
ministrative and assessment staff had had personal 
contact with most of the trainees. While the MMPI 
and a few less-structured assessment instruments 
(e.g., a sentence-completion form) were administered 
to all trainees in each of the five projects, these data 
were seen only by the assessment staff. The only test 
scores available to all Board participants were scores 
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TABLE 1 


AGREEMENT AMONG SEVEN SELECTION BOARD PARTICIPANTS BEFORE AND AFTER BOARD DISCUSSION 
(Boarp A; 57 TRAINEES) 








Selec- Assess-  Assess- ee ener: PDO | Predis- Predis- Predis- 
tion ee ment P syenla— et 10/6) ac OOl alam cers cussion cussion cussion 
oiReer Sos oe trist-1 director nator-1 i M = 7 
Selection 
officer 718 .69 65 18 59 .63 5.0 2.6 .69 
Assessment 
officer-1 91 83 76 78 ai 59 5.3 1.9 72 
Assessment 
officer—2 92 89 .69 ail .62 .62 6.0 1.6 .69 
Psychiatrist-1 80 83 2 we .70 54 5.2 2.4 .68 
Project director}  .86 86 84 87 .65 .63 ee 1.6 Bie 
Coordinator-1 | 9.91 alm £00) wine. O1 ans mE neg 39 |) 63° | 75 aaa 
PDO officer-1 74 .68 72 .69 76 71 4.8 1.4 57 
Postdis- 
cussion M 4.8 5.0 Saas 5.0 5.0 5,8 4.6 
Postdis- P 
cussion o Bull 2.5 2.4 Dai DS 2.1 2.0 
Postdis- 
cussion 7 .86 84 .83 719 84 85 ahi 





Note.—All 57 trainees were rated on a 0-10 scale. Correlations among ratings prior to Board discussion are presented above the 
diagonal; correlations among ratings after the Board discussion are presented below the diagonal. Diagonal entries (circled) present 


the pre- versus postdiscussion correlation for each participant. 


on a modern language aptitude test and scores on a 
test of general verbal ability. 

For Boards at which prediscussion ratings were 
collected (Boards A and B), Board participants were 
requested to read the assessment material of the first 
trainee and on the basis of this information, plus 
any other idiosyncratic information they might possess, 
to rate the trainee’s overall suitability for service as a 
Peace Corps Volunteer. After all ratings were com- 
pleted for the first trainee, the Board discussed the 
trainee and any idiosyncratic information was shared 
and evaluated. When the discussion was completed, 
but before turning to the next trainee, postdiscussion 
ratings were made. The same procedure was followed 
for each trainee, in turn. The ratings were made indi- 
vidually, and the raters did not compare their ratings 
with each other. The procedures for Board C were the 
same as for Boards A and B, with one exception: for 
Board C, all participants (except the Selection Officer) 
rated each trainee prior to the actual start of the Board, 
before any of the mimeographed assessment material 
was distributed. These pre-Board ratings, based upon 
highly variable amounts of personal observation of 
the trainees during the training period, were then fol- 
lowed by the same prediscussion and postdiscussion 
rating procedures used with Boards A and B. The 
procedures for the other six Boards differed only in 


that no prediscussion (and no pre-Board) ratings were 
made. 


RESULTS 


Table 1 summarizes the findings for Board 
A. In Table 1 the prediscussion correlations 
among ratings are presented above the diag- 
onal, while the corresponding postdiscussion 
correlations are presented below the diagonal. 
Note that for every pair of Board participants, 
postdiscussion correlations were higher than 
prediscussion ones, suggesting that some 
appreciable consensus in ratings occurred as a 
function of Board discussions. The mean 
prediscussion correlation was .66; the mean 
postdiscussion correlation was .82. Moreover, 
all of the standard deviations of the ratings 
were slightly higher after discussion (¢ = 2.4) 
than before discussion (¢ = 1.9), indicating 
that discussion served to increase judgmental 
differentiation among the trainees being rated. 

Of some interest is the marked stability 
of prediscussion versus postdiscussion ratings 














PEACE Corps SELECTION BOARDS 403 
TABLE 2 
AGREEMENT AMONG SEVEN SELECTION BOARD PARTICIPANTS BEFORE AND AFTER Boarp Discussion 
(BoarD B; 88 TRAINEES) 
Assess- Assess- Train- : ; ° 
chines ment ment Psychia- Project Coordi- ing He pe ee 
Bier officer- _—_asso- trist-1 director nator-2  officer— r 
3 ciate-1 1 5 

Selection 

officer 82 83 79 .80 .67 718 Sel os 78 
Assessment 

officer-3 91 82 78 82 70 83 6.2 ea .80 
Assessment 

associate-1 .92 88 .80 .80 70 82 5G D5) 80 
Psychiatrist-1 85 83 86 73 70 76 5.0 2.2 76 
Project director| .86 88 86 82 73 Tia Sis iets 
Coordinator-2 .80 84 .80 .80 83 61 5.5 les .68 
Training 

ercer1 88 87 85 83 81 71 (86) St7n <3 30s ens 
Postdis- 

cussion M 4.9 5.8 Suk 52 SES 5.6 5 
Postdis- 

cussion « 2.6 2.4 eal ae ee 1.9 Sl 
Postdis- 

cussion 7 87 .87 .86 .83 84 .80 .82 





Note.—All 88 trainees were rated on a 0-10 scale. Correlations among ratings prior to Board discussion are presented above the 
diagonal ; correlations among ratings after the Board discussion are presented below the diagonal. Diagonal entries (circled) present 


the pre- versus postdiscussion correlations for each participant. 


for individual Board participants. While the 
average of these correlations was .87, the 
Selection Officer seemed unusually resistant 
to change (ry = .97). One explanation for both 
the apparent convergence onto the Selection 
Officer’s ratings and his own seeming stub- 
bornness may stem from the fact that the 
Selection Officer had access to the background 
reports, the tenor of which he conveyed to the 
Board during the discussion period. 

A common observation among Selection 
Board participants is that a great deal of the 
discussion of each trainee focuses on un- 
desirable or derogatory aspects of his behavior. 
An indirect confirmation of this impression 
can be seen by comparing the prediscussion 
versus postdiscussion mean ratings. While 
none of these changes was great, there was a 
consistent tendency for the prediscussion mean 
ratings (M = 5.4) to be higher than the cor- 
responding postdiscussion means (M = 5.1), 


suggesting a slight devaluation of the suita- 
bility of the average trainee after Board 
discussion. 

Table 2 summarizes the corresponding find- » 
ings for Board B. Virtually all of the findings 
from the analyses of Board A were replicated 
in Board B, though the correlation coefficients 
tended to be somewhat higher. The mean pre- 
discussion correlation was .76, while the mean 
postdiscussion correlation rose to .84. Again, 
the average postdiscussion standard deviation 
(¢ = 2.5) was higher than the corresponding 
prediscussion one (¢ = 2.0). Differences in 
mean ratings were slight, though again the 
average postdiscussion mean (5.5) was higher 
than the average prediscussion mean (5.4). 
Intrajudge correlations, pre- versus postdis- 
cussion, were again remarkably high (7 = .90), 
with the Selection Officer again being the most 
recalcitrant to change (r = .96). 

In order to check whether the consensus 
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TABLE 3 


AGREEMENT AMONG EIGHT BOARD PARTICIPANTS PRIOR TO RECEIVING FINAL BOARD INFORMATION 
(Boarp C; 46 TRAINEES) 











Assess- 
ee aes ment Psychia- Project Assistant Coordi- Oversee 
ment ment ged dire director t tort represen- 
BfiCeeel Or cere) ree trist— irector irector nator jag 
jate= 

Assessment 

officer—2 54 
Assessment 

associate—1 3 67 
Psychiatrist-1 20 67 43 
Project director 22 38 .16 stl if 
Assistant 

director—2 48 54 os) 36 10) 
Coordinator-1 S51 70 58 48 45 42 
Overseas repre- 

sentative 16 .60 Al 34 39 i 50 
M Slt 4.8 5.7 570 6.6 Sat 6.9 Son 
o 1.6 2.8 1.9 2a eS 2.0 1.4 Veli 
Average r BOD) 59 40 38 30 Al Fou 38 
Correlation with 

prediscussion 

rating 84 98 88 84 38 78 718 56 
Correlation with 

postdiscussion 

rating 64 88 77 63 A8 64 713 BOIL 








among Board participants would be as great 
when the amount of information available 
for each trainee was increased, five Final 
Selection Boards were studied. Tables 3 and 4 
summarize the findings from the first of these, 
Board C. Table 3 presents the correlations 
among eight Board participants prior to the 
start of the Board, before Board materials 
were made available. Consequently the agree- 
ment coefficients in Table 3 represent the 
amount of consensus among individuals after 
highly variable amounts of personal interac- 
tion with the trainees. Since many of the 
participants had had no personal contact with 
some of the trainees (e.g., the psychiatrist 
had only interviewed 28 of the 46 trainees), the 
correlations presented in Table 3 were com- 
puted on all of the trainees rated by both 
members of each pair of raters. The number 


of cases used to compute each correlation 


varied from 12 to 46; in general, most of the 
correlations in Table 3 were based on 30 to 40 
cases. 

Comparing Table 3 with Table 4, one can 
see an orderly progression in judgmental 
convergence as more information became 
available. The average agreement correlation 
increased from .41 (pre-Board) to .68 (pre- 
discussion) and then again to .83 (postdiscus- 
sion), illustrating rather dramatically the 
impact of the Board materials and Board 
discussion upon these suitability judgments. 
As before, the mean ratings tended to decrease, 
from 5.6 (pre-Board) to 5.1 (prediscussion) 
to 4.9 (postdiscussion). Judgmental differen- 
tiation among individual trainees again showed 
an orderly increase: from a @ of 1.8 (pre- 
Board) to 2.2 (prediscussion) to 2.8 (post- 
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TABLE 4 


AGREEMENT AMONG NINE FINAL SELECTION BOARD PARTICIPANTS BEFORE AND AFTER BOARD 
Discussion (BOARD C; 46 TRAINEES) 





Selec- Assess- Assess- Assess- Assist- Over- f ? ; 
tion ment ment ment Psychia- Project ant Coordi- 828 | Predis- Predis- Predis- 
officer officer- officer- asso-  trist-1 director direc- nator-1 eae CNSR) GEESHOyY Ceueeen 
1 2 ciate-1 tor-1 Se % A 
lve 
Selection 
officer (sr) 85 ao .76 adil -60 -60 «74 -61 4.3 3.0 CGE 
Assessment 
officer-1 91 71 iD Shs -68 57 «74 58 4,9 ae -70 
Assessment 
officer—2 OL 90 (#9) Ok nae, -68 68 74 .80 4.9 2.8 nie 
Assessment 
asso- 
ciate-1 .86 92 .89 67 Rao) 67 oie .68 4.8 2.6 ate 
Psychia- 
trist-1 .83 .88 .87 .87 58 .70 rod 58 5.0 2a2 .66 
Project 
director 69 aah er iis wu 79 (83) 58 67 61 5.4 2.0 .64 
Assistant 
direc- 
tor-1 .76 .78 81 81 .80 we (86) 58 -66 4.8 2.0 63 
Coordi- 
nator-1 .83 86 .86 .87 83 78 80 (87) .69 6.7 1.7 68 
Overseas 
represen- 
tive .82 .86 .86 .86 .87 .76 88 81 (74) 5.2 1.6 .65 
Postdis- ae 
cussion 
M 4.2 4.5 Sl 4.5 4.9 5.1 4.5 6.3 4.8 
Postdis- 
cussion g} 3.1 3.3 3.3 207: 2.4 2.9 200) 2.3 2.3 
Postdis- 
cussion 7 83 .86 .86 85 84 ate 80 .83 84 





See Note at bottom of Table 2. 


discussion). And, as before, the pre- versus 
postdiscussion correlations for individual par- 
ticipants were high (7 = .87), with the Selec- 
tion Officer again showing the least change 
(r = .97). 

The correlations among Board participants 
reported in Tables 1, 2, and 4 appear unusually 
high compared to those previously reported 
in the literature on clinical judgment. Since 
it is possible that the prediscussion ratings 
tended to influence the participants and there- 
by confound the postdiscussion ratings, it is 
of theoretical interest to compare these ratings 
with some made after Board discussions but 
without prediscussion ratings. Consequently, 
six other Boards—two Intermediate (Boards 
F and G) and four Final (Boards D, E, H, 
and JI)—where prediscussion ratings were 
not carried out, were studied. While there 
were some slight differences between the 
findings from these six Boards, on the average 


their results replicated rather precisely the 
postdiscussion findings from the other three 
Boards. Table 5 presents a tabular summary 
of the data from all nine Boards, thus per-. 
mitting an easy comparison. In general, the 
findings look remarkably consistent across the 
nine Boards. 

As a check to see whether there were any 
differences in the degree of agreement between 
pairs of Board participants holding different 
roles (e.g., to ascertain whether psychiatrists 
tended to agree more with Assessment Officers, 
for example, than they did with the Project 
Director), Board participants were classified 
into eight role-types (Selection Officer, Assess- 
ment Officers, Assessment Associates, Psychia- 
trists, Project Director, Assistant Directors 
and/or Coordinators, PDO Officers, and Over- 
seas Representatives) and the agreement cor- 
relations among all individuals falling into each 
of the resulting 28 role pairs were averaged 
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TABLE 5 


SUMMARY OF THE ANALYSES FoR NINE SELECTION BOARDS 

















Board Board | Board | Board | Board | Board Board | Board | Board 
A B c D E r G H q | |S %eteee 

Type of Board Inter- Inter- | Final | Final | Final | Inter- Inter- | Final | Final 

mediate | mediate mediate | mediate 
No. participants 7 7 9 8 9 7 6 8 7 
No. trainees 57 88 46 76 82 81 32 73 29 
Pre-Board M — — 5.6 — -—- — — —- — 5.6 
Prediscussion 1 5.4 ED Sal = — —- — — — He! 
Postdiscussion El 5.4 4.9 Se 4.4 5.3 4.6 SES 5.6 Sel 
Pre-Board & _— —_— 1.8 — ~- - = — = = 1.8 
Prediscussion & 1.9 2.0 De) + — = os — — 2.0 
Postdiscussion & 2.4 ES 2.8 Del 2.8 De a 2:4: Dee Di) 
Pre-Board 7 — — Al -— — = a — a 41 
Prediscussion 7 .66 76 .68 —— — — = = = 70 
Postdiscussion 7 82 84 83 87 76 78 81 79 75 81 
Pre- versus Post-r 87 90 87 ~- — — a a — 88 
































across all nine Boards. In general, role differ- 
ences in degree of agreement were greater 
for prediscussion ratings than for postdiscus- 
sion ratings. The highest postdiscussion aver- 
age agreement correlations occurred between 
the Selection Officer and the Assessment 
Officers (F = .87), followed closely by that 
between the Selection Officer and the Psy- 
chiatrists (7 = .86); this finding may have 
quite limited generality, however, since only 
one Selection Officer was studied. The lowest 


TABLE 6 


INTERMEDIATE BOARD VERSUS FINAL BoarpD RATINGS 








Board A | Board B| Board F 
Variable versus versus | versus 
Board C } Board D | Board H 
No. trainees at Final 
Board 46 76 73 
No. overlapping Board 
participants 6 6 5 
M (Intermediate Board) 5.8 Sadi 5.4 
M (Vinal Board) 5.0 Del 5.4 
& (Intermediate Board) 2.1 2.0 223) 
& (Final Board) 2.9 2.8 2 
r: range .96-.78 | .55-.77 | 49-.77 
r .69 65 65 














postdiscussion average correlations were found 
between PDO Officers and Coordinators. 
(7 = .73), and between the Assessment Asso- 
ciates and the Project Director (7 = .75). 

While the preceding analyses have all 
focused upon ratings made during a relatively 
brief interval of time, of equal interest are the 
correlations between suitability ratings made 
by the same Board participants (a) at an 
Intermediate Board and (6) 6 weeks later at a 
Final Board. Since the present study includes 
both Intermediate and Final Board ratings 
from each of four projects, it was possible 
to compute the correlations between the two 
sets of ratings for those individuals who par- 
ticipated on both Boards. While making their 
Final Board ratings, Board participants did 
not have access to their Intermediate Board 
ratings. 

Table 6 summarizes these findings from the 
three largest projects, for the subset of trainees 
who remained in each project until the Final 
Board. Included in Table 6 are the number of 
trainees rated at each of the Final Boards, as 
well as the number of Board participants who 
were involved in both Boards. The mean rat- 
ings and dispersions for each of these Board 
participants were computed across the subset 
of trainees they rated at both Boards; the 
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averages of these means and standard devia- 
tions are presented in Table 6, for both Inter- 
mediate and Final Board ratings. Finally, 
the correlations between Intermediate and 
Final Board ratings were computed for each 
of the Board participants across the subset 
of trainees common to both Boards; the ranges 
of these correlations are presented in Table 6, 
along with the mean correlations. 

The correlations between ratings made at 
Intermediate and Final Boards averaged .69, 
.65, and .65 for the three projects, indicating 
considerable—though far from perfect—judg- 
mental stability over time. The correlations 
were of approximately the same magnitude 
as those found among participants prior to 
Board discussion (see Table 5). 

In general, the mean ratings of the same 
trainees by the same Board participants were 
lower at the Final Board than at the Inter- 
mediate Board, while the dispersions of the 
ratings were larger at the Final than at the 
Intermediate Boards. Since some of the lowest 
rated trainees had been separated from the 
projects at the Intermediate Boards, the mean 
Intermediate Board ratings were higher for 
the subgroup which remained in the project 
(Table 6) than for the original groups (Table 
5); conversely, the dispersions of the ratings 
were smaller for the Final Board subgroups 
than for the larger groups. At the Final Boards, 
however, the values of the means and standard 
deviations for the smaller groups approached 
those from the larger groups, indicating that 
Board participants tended to evaluate trainees 
relatively (e.g., ‘on a curve”), rather than 
absolutely. Since the evaluation of a particular 
trainee appears to be highly dependent on his 
relative status in his group, the validity of these 
Selection Board ratings could be severely 
attenuated by any interproject differences in 
the “quality” of the average Peace Corps 
trainee. 


Discussion 


By far the most remarkable finding from the 
present study was the substantial degree of 
interjudge agreement among Peace Corps Se- 
lection Board participants, even prior to any 
Board discussion. Moreover, the increased con- 
sensus as a result of Board discussion certainly 
lends credence to the rationality (Goldberg, 
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1963) of this stage of the Peace Corps selection 
process. In addition, since most of the Assess- 
ment Officers on these Boards could have 
served as Selection Officers on other projects, 
the present study supplies some indirect evi- 
dence that the evaluations from one Selection 
Officer may show a substantial degree of agree- 
ment with those of others. This does not imply, 
however, that individual Selection Officers 
may not differ appreciably in their subjective 
cutoff points between persons selected as 
Volunteers and those not sent overseas. In- 
formal discussions among Selection Officers 
have convinced the author that the degree 
of agreement on the overall ranking of a set 
of trainees by different Selection Officers 
would probably be quite great, but that Selec- 
tion Officers would differ significantly in the 
percentage of the group they would select as 
Volunteers. 

The degree of interjudge agreement among 
members of decision-making teams in other 
settings has never been adequately explored 
and, consequently, any comparison of the post- 
discussion correlations among Peace Corps 
Selection Board participants with those from 
other groups must await similar studies in 
different contexts. However, the prediscussion 
correlations from this study can be compared 
with the numerous studies of consensus in 
clinical judgments more generally. Such studies 
have indicated a vast range of reliability 
coefficients, depending on the nature of the 
judgmental task. For example, in an early 
inferential reliability study, Bendig (1955) » 
asked 40 graduate students in psychology 
to rate each of 10 abstracted clinical case 
histories on a 7-point scale of global adjust- 
ment level. Bendig reported average inter- 
judge reliability coefficients around .84. On the 
other hand, Howard (1962) had seven clinical 
psychologists rank order 10 needs for each of 
10 patients on the the basis of Rorschach, 
TAT, and sentence-completion test protocols. 
lor this task, Howard found interjudge agree- 
ment correlations, for the same projective test, 
to average only .19. Clearly, the findings from 
the present study resemble those of Bendig 
more than those of Howard. 

One possible explanation for the rather high 
consensus among judgments in the present 
study may stem from the fact that the overall 
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suitability ratings made by participants in 
Peace Corps Selection Boards are, by and large, 
mostly evaluasive in nature (as were, of course, 
the ratings in Bendig’s study). Were more 
specific predictions demanded of the Board 
participants, it would not be surprising if the 
resulting interjudge correlations would de- 
crease appreciably. 

Finally, the findings from this study should 
in no way be construed as reflecting on the 
validity of these Board ratings, since even 
the most reliable of clinical judgments may be 
badly misaligned with reality. Only through 
further research on the validily of Peace 
Corps selection procedures—including studies 
in which all trainees are sent overseas—can this 
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latter (and more important) question be 
answered definitively. 


REFERENCES 


Benpic, A. W. Rater experience and case history 
judgments of adjustment. Journal of Clinical Psy- 
chology, 1955, 11, 127-132. 

GOLDBERG, L. R. Peace Corps selection as an assess- 
ment strategy. Proceedings of the Peace Corps— 
NIMH conference: “The Peace Corps and the be- 
havioral sciences.’’ Washington, D. C.: Peace Corps, 
1963. 

Howarp, K. I. The convergent and discriminant vali- 
dation of ipsative ratings from three projective in- 
struments. Journal of Clinical Psychology, 1962, 
18, 183-188. 


(Received August 2, 1965) 


Journal of Applied Psychology 
1966, Vol. 50, No. 5, 409-411 


MILLER ANALOGIES TEST: 
A NOTE ON PERMISSIVE RETESTING 


ROBERT G. LANE,1 NOLAN E. PENN, anp ROBERT F. FISCHER 


Student Counseling Center, University of Wisconsin 


Mean scores on the Miller Analogies Test (MAT) were computed for 84 
graduate students (UW group) who took the MAT twice—Form K followed 
by Form J. Retest scores were significantly higher (p <.001). When Equiva- 
lence study (ES) data reported in the MAT manual were analyzed, retest 
scores on Form J were also found to be significantly higher (p < .001) than 
initial scores on Form K. However, the difference for the UW group was 
significantly greater (p<.05) than the corresponding difference in the ES 
sample. The greater difference for the UW group may be explained partially 
as a regression phenomenon; however, some questions were raised as to 
practice effects and the reliability of the 2 forms. 


A recent survey of Miller Analogies Test 
(MAT) records revealed that some applicants 
for graduate study had retaken the MAT 
from two to four times. Usually, these re- 
peaters were students who had not reached 
their department’s criterion scores and were 
attempting to raise their scores in order to 
qualify for the graduate program. In some 
cases the MAT had been repeated within a 
few days after the initial testing, although the 
Controlled Testing Centers (CTC) Guide 
(1964) published by The Psychological Cor- 
poration states that “‘retesting after two years 
is considered normal and desirable [p. 8].” 
The CTC Guide also recommends that 


if the elapsed time is short, say, six months or less, 
stress to the examinee that all previous scores will 
be reported and try to discourage retesting unless 
acceptable motives are presented. Feel free to refuse 
to retest a person if you think “practice” is the 
primary motive. Otherwise, retesting is acceptable 
providing the examinee reports his purposes [p. 8]. 


The test-retest data reported in the MAT 
Manual (1960) fail to state the time lapse 
between administrations of Forms K and J 
(or J and K). It would seem that more in- 
formation and clarification is needed to 
develop a more definitive policy statement. 
In previous test-retest studies of the MAT, 
Spielberger (1959) and Coladarci (1960) 
found evidence of practice effects. Coladarci 
concluded that retest performance was sub- 
stantially higher than performance on an 


1 Now at Winnebago State Hospital, Winnebago, 
Wisconsin. 





initial test even when initial test scores were 
corrected for errors of measurement due to 
regression. Spielberger found that the effects 
of practice on the MAT scores were most 
facilitative for subjects (Ss) with low initial 
scores. Coladarci found no relationship be- 
tween improvement and level of initial scores. 

The MAT Manual (1960) reports the re- 
sults of an equivalence study (ES) of Forms 
K and J in which it was found 


that Form K is highly correlated with Form J and 
the reliability of each form is very satisfactory. Of 
particular interest are the means and standard devia- 
tions of Forms K and J when each was adminis- 
tered as the first test. The difference between stand- 
ard deviations was trivial (0.2); the difference 
between means was less than one point (0.8) 
[p. 18]. 


The preceding statements led to the follow- 
ing hypotheses: (a) There are no significant 
differences between the mean scores of MAT 
Form-K and J administered to the University 
of Wisconsin (UW) groups at various time 
intervals. (b) There are no significant dif- 
ferences between the corresponding mean 
scores in the ES sample. And (c) if differ- 
ences are found between the mean scores of 
MAT Forms K and J, the differences will 
be of the same magnitude in the UW sample 
and the ES sample. 


METHOD 


Subjects 


The Ss in this study were graduates, seniors, or 
non-UW students who were administered the MAT 
at UW from 1960 through 1964. Test-retest scores 
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were collected on a total of 84 Ss of which there 
were 71 males and 13 females. 

The MAT Manual reported that a total of 312 
Ss were administered Form K on initial testing and 
Form J on the retest. The Ss for this group were 
identified only as seniors and graduate students at 
six institutions. 

All Ss had been administered Form K_ initially 
and Form J on repeat testing. This particular 
sequence of test forms was used because Form K 
was the most recent edition of the MAT (1959) 
and covered the data-collection period, while the 
previously developed Form J (1952) served as an 
alternate form. 


Analyses 


Scores for the UW sample were divided into three 
groups: (1) total sample (V = 84); (2) those Ss who 
had taken Form J within 1 year after initial testing 
with Form K (N = 69); (3) those Ss who had taken 
Form J more than 1 year after initial testing with 
Form K (N=15). 

To test the first two hypotheses, differences be- 
tween the means of Forms K and J were computed 
for each UW group and for the ES sample. The ¢ 
test was employed to determine the significance of 
differences. Results are presented in Table 1. 

To test the third hypothesis, the differences be- 
tween the means of Forms K and J in the UW 
sample were compared with the corresponding mean 
differences in the ES sample. Each of the UW 
groups was compared with the total ES sample. 
Results are presented in Table 2. 


RESULTS 


The mean of MAT Form J was signifi- 
cantly greater than the mean of Form K in 
each of the UW groups and also in the ES 
sample (Table 1). The first two hypotheses 
may be rejected. The difference between the 
means of Forms K and J in the total UW 
sample (Group 1) was significantly greater 
than the corresponding difference in the ES 
sample (Table 2). Comparisons based on 
subsamples of the UW group failed to show 
significant differences. The third hypothesis 
may be tentatively rejected on the basis of 
the Group 1 results. 


DISCUSSION 


Inspection of Table 1 reveals the UW 
sample means to be considerably lower than 
means reported for the ES sample. The first- 
test mean of the UW sample is lower than 
the means of 16 of the 18 graduate and pro- 
fessional school groups reported in the MAT 
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TABLE 1 


MEANS AND DIFFERENCES BETWEEN MEANS OF MAT 
Forms K anp J in UW anp ES SAMPLES 














Form K Form J 
Group| V Ditka 
M ID) M SD 
UW1] 84} 42.38 | 11.15 | 50.89 | 12.14 | 8.51 | 5.16** 
UW 2| 69 | 41.56 | 11.50 | 50.09 | 12.01 | 8.53 | 4.18** 
UW3) 15} 46.13 | 8.74 | 54.60 | 12.43 | 8.47 | 2.18* 
ES 312 | 51.70 | 16.40 | 56.30 | 15.40 | 4.60 | 3.61** 
“P= .05, 
**'> <= .001. 


Manual. The above comparisons indicate that 
the UW sample is a relatively low-scoring one 
—that its mean score appears lower than the 
mean of the population of graduate and pro- 
fessional students for whom the test was de- 
signed. This seems reasonable, because low 
scorers would be most likely to apply for re- 
testing in order to meet graduate program 
criterion scores. It may be expected, there- 
fore, that the scores of the UW group will 
regress toward the population mean upon re- 
testing. Consequently the occurrence of a 
greater discrepancy between test and retest 
scores for the UW group than for the ES 
sample may be at least partly explained as 
an effect of regression. 

However, the significant differences be- 
tween test and retest for both the UW group 
and the ES sample would also seem to suggest 
one or both of the following possibilities: 
(a) That a substantial practice effect has 


TABLE 2 


COMPARISON OF DIFFERENCES BETWEEN MEANS OF 
MAT Forms K ann J In UW Groups 
AND ES SAMPLE 





N aaa 
UW adatie. 
Group ; Diff. t 
uw | Es | uw | Es 
1 | 84 | 312 | 8.51 | 460 | 3.91 | 2378 
2 | 69 | 312 | 8.53 | 4.60 | 3.93 | 1.93" 
3 | 15 | 312 | 845 | 460 | 3.85 | 0.907 
*p <.05 
KD > 05 


RETESTING WITH THE MILLER ANALOGIES TEST 


occurred, or (0) That Form K is more 
difficult than Form J. 

There is no question but that Ss in this 
study, by taking the MAT a second time, 
not only improved their scores but at the 
same time their chances of meeting the cri- 
terion scores set by their respective depart- 
ments. It should also be noted that of the 84 
Ss, 15 increased their raw scores more than 
15 points or one standard deviation of the 
ES sample. 

It would appear then that a permissive 
MAT retesting policy directed toward gradu- 
ate school applicants raises some questions 


411 


as to its being a very effective use of the 
instrument. 
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PERCEIVED MONETARY VALUE OF JOB TYPE, 
COMPANY SIZE, AND LOCATION AMONG 
COLLEGE SENIORS 


LOUIS A. RICHARDSON, JR. 
University of North Carolina at Chapel Hill 


The influence of job location, starting salary, type of work, and company 
size in the job choices of 113 business administration students was studied. 
The method of factorial paired comparisons and a scaling technique which 
assigned a monetary value to each factor were used. Although an $800 salary 
differential influenced the job choices more than location, type of work, or 
company size, all the factors had a highly significant (p < .01) influence and 


had considerable “trade-off” value. 


Many of today’s college seniors have their 
choice among several job offers. With the 
demand for college-educated employees out- 
running the supply, the graduate’s job hunt 
is becoming more a matter of finding the right 
job than of finding employment per se. 

This choice of a first job after college may 
be one of the most important decisions the 
graduate ever will make. The rate at which 
he develops and advances will, to some ex- 
tent, be determined by the job he chooses. 
His experiences on his first job probably 
will have a lasting effect on his attitudes 
toward work and on his work habits. 

Competition among employers for qualified 
graduates is intense. As a result the decisions 
of sought-after graduates regarding job offers 
are becoming increasingly important to the 
employers involved. 

Very few definitive studies of the college 
graduate’s job choice have been undertaken. 
Other than the few studies which have been 
conducted by university researchers (Jaeger, 
1955; Odiorne & Hann, 1961), most of the 
existing research has been done by employers 
themselves (Barmeier & Kellar, 1957; 
Fernow, 1958; Lester & Wright, 1957; Rose, 
1962; Walton, 1960). They have, quite natu- 
rally, been interested primarily in finding 
out how graduates react to job offers from 
their own companies rather than in finding 
out what affects the job choice of graduates 


1 This study is based on a master’s thesis, com- 
pleted under Thomas H. Jerdee of the University of 
North Carolina at Chapel Hill. Appreciation is 
extended to R. Darrell Bock for his advice regarding 
design and analysis. 


in general. While the results of such studies 
are useful to the companies involved, they 
are of limited generality because of the 
extent to which the subjects’ (Ss’), responses 
are influenced by conditions unique to the 
sponsoring firms. 

The present research was a study of the 
influence of four factors (job location, start- 
ing salary, type of work, and company size) 
in the job choice of seniors in The University 
of North Carolina (UNC) School of Busi- 
ness Administration. There were 16 hypo- 
thetical job offers constructed, each describ- 
ing a job in terms of the four factors under 
study. A sample of randomly chosen seniors 
in business administration was presented with 
a series of situations in which they were 
required to choose between pairs of these 
hypothetical job offers. A psychological scal- 
ing method described by Bock and Jones 
(1966) was used to determine the relative 
preference of Ss for the 16 alternative “job 
offers” and the relative importance of job 
location, starting salary, type of work, and 
company size in Ss’ job choices. 


METHOD 
Subjects 


A random sample of 113 senior men majoring 
in business administration at UNC served as Ss. 
Accounting majors were excluded from the sample, 
because the hypothetical job situations presented in 
the study could not realistically be applied to them. 


Stimuli 


There were 16 hypothetical “job offers” which 
served as experimental stimuli. The “job offers” 
were constructed by establishing two alternative 
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states for job location, starting salary, type of 
work, and company size, and combining these factor 
states in all possible ways. Each “job offer” con- 
tained one state selected from each of the four 
general factors. In establishing the factor states 
an effort was made to make them realistic and to 
provide contrasts which would permit measurement 
of the desired effects. The four factors and their 
corresponding states appear in Table 1. 

The two cities chosen as the states of the job- 
location factor permitted measurement of the rela- 
tive preference of Ss for a location near the uni- 
versity and, in most cases, near their homes, 
Winston-Salem, N. C., as opposed to another loca- 
tion geographically distant from the school and 
their homes, South Bend, Indiana. 

The states of the salary factor were based on 
data compiled by the UNC Placement Service. 
According to a survey conducted during the aca- 
demic year 1963-64, $5,400 is an average starting 
salary for business administration graduates accept- 
ing nonaccounting jobs, while $6,200 is a relatively 
high starting salary for these graduates. Therefore, 
these two salary states permitted measurement of 
the extent to which an above-average salary offer 
influences the job choice of business administration 
seniors. Several preliminary tests were conducted to 
determine a suitable differential. On the basis of 
these tests, the differential was set at $800. 

The jobs which ordinarily would be of interest 
to business administration graduates may be grouped 
into the two general categories of management 
trainee and sales trainee. Almost all business gradu- 
ates must decide whether they want to begin their 
careers in sales or as management trainees; there- 
fore, this factor presents a realistic situation. Ex- 
planations of the duties involved in these types of 
work were given on a directions sheet which was 
given to each S. However, these explanations were 
made brief so that Ss would have to base their 
decisions primarily on their own _ stereotyped 
concepts. 

Often business graduates must decide whether they 
want to work for a large, established company or 
for a small company with growth potential. The 
size-of-company factor was designed to measure the 
relative preference of Ss for these two types of 
companies. Again only brief explanations of these 
two types of companies were given so that Ss would 
have to base their decisions primarily on their own 
stereotypes. 


Design 


The method of factorial paired comparisons was 
used to scale the “job offers.”2 In a complete 
paired-comparison design, each “job offer” would be 
paired with all other “job offers.” This would mean 
that 120 pairs would have to be judged by each 
S, a rather formidable task. In the present research 








2A detailed model, and the methods for its sta- 
tistical evaluation, are presented by Bock and Jones 
(1966, Ch. 7). Also, see Jones and Jeffrey (1964). 


413 


TABLE 1 
Tue 4 Jop FAcToRS AND THEIR STATES 





Factor State 1 State 2 
A. Location Winston-Salem, | South Bend, 
NIG Ind. 
B. Starting salary| $6,200 per year | $5,400 per year 


C. Type of work | Management Sales trainee 
trainee 

Small company 
with growth 


potential 


Large, estab- 
lished com- 


pany 


D. Company size 





this difficulty was circumvented by utilizing a bal- 
anced block design (Bock & Jones, 1966; Jones & 
Jeffrey, 1964), where not all possible pairs are pre- 
sented. The Ss were required to judge 57 pairs of 
“job offers.” 


Procedure 


The Ss were presented with two “job offers” at 
a time and were required to choose the one which 
they preferred. The pairs of “job offers” were pre- 
sented to Ss in the form of a booklet, with one 
item (i.e., one pair of “job offers”) on each page. 
The pages of each booklet were shuffled before being 
stapled to randomize the order in which the vari- 
ous combinations appeared. The Ss indicated their 
preferences by placing an X above the preferred 
job. A directions sheet, which contained brief 
explanations of the factor states, accompanied each 
booklet. 


Analysis 


Estimates of the “affective values” of the stimuli 
were obtained through a method described by Bock 
and Jones (1966). The proportion of Ss who pre- 
ferred each “job offer” to each of the other “job 
offers” was first determined. Estimates of the dif- 
ference between pairs of stimuli were then obtained 
by looking up the arc sine deviates corresponding 
to the observed proportions of preference for the 
first member of each pair of stimuli. For the stimu- 
lus pair j,k in which stimulus 7 has the job features 
A,;B,;C,Dy; and stimulus & the features AxBzC;.Dx the 
arc sine deviate is 

Tv 


Vik = 2-sin V pie — 3° [1] 


The large sample variance of these deviates is 1/N 
when the arc sines are expressed in radian measure. 
N is the number of Ss in the sample. 

On the basis of Thurstone’s Law of Comparative 
Judgment, the composition of the deviates is as- 
sumed to be 
Vik = ay — on + By — Be-t+ vi — Ve 

+ 6; — Oe + Ex [2] 
= (a5 +6; + v5 +4) 
— (ax + Be + ve + de) + Ex. 
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TABLE 2 


THE 16 “Jos Orrers” RANKED By ESTIMATED 
AFFECTIVE VALUES 











Factor states Estimated 
Job offer affective 
i B Cc D value 

1 1 1 1 1 .6674 
2 1 i| 1 2 4980 
3 1 1 2 1 a ) 
4 1 1 2 2 Pose 
5 2 1 1 1 2343 
7 2 1 1 Z, 1677 
8 1 2 1 2 .0406 
9 2 il 2 1 — .0894 
10 2 1 2 2 —.1207 
14 2 2 1 2 — 3836 
16 2 2 2 2 — 5843 








The parameters a, b, c, and d represent main ef- 
fects of the four job factors. The error E is assumed 
to be normally distributed with mean zero and 
variance 1/N. 

The affective value of stimulus7, namely a; + B; + yj 
-++ 6,, is estimated by v;, say: 


r n 
0; = — (yj-+ YD mayyn-/r), [3] 
aN h=1 
where 
n 
V0 = Ll nyK GK 
E=1 


r is the number of other stimuli with which each 
stimulus is paired, ~ is the number of stimuli, and 
is the number of pairs in which each of two 
selected stimuli are paired with the same stimulus. 
mny and nj, are the so-called incidence numbers of 
the incomplete design. They equal 1 if the sub- 
scripted comparison is included in the design and 0 
if it is omitted. The incidence plan of this design 
can be seen in Table 2 of Jones and Jeffrey (1964). 
For this design r= 6, n= 16, and \=2. 

Associated with these estimates is an analysis of 
variance which tests the goodness of fit of the 
paired-comparison model. This analysis is shown in 
Table 3 along with the analysis of variance for the 
factorial design of the stimuli. 

The main effect differences, that is, the difference 
between the two states of each factor, were pre- 
sented in terms of “main effect contrasts.” The 
contrasts for the location, type of work, and size- 
of-company factors were expressed as proportions of 
the salary “contrast.” Since it was known that the 
difference between the two states of the salary factor 
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was $800, it was possible to obtain a monetary 
measure of the difference between the two states 
of the other factors by taking the appropriate pro- 
portion of $800. This permitted direct comparisons 
of the relative values of the preferred states of the 
various factors. 


RESULTS 


The estimated ‘affective values” of the 
stimuli were first obtained. These values are 
presented in Table 2. 

An analysis of variance was performed to 
provide both tests of significance and esti- 
mates of the relative size of the “contrasts” 
between the states of each significant factor 
effect. A summary of this analysis appears 
in Table 3. All four factors (starting salary, 
job location, type of work, and company size) 
proved to have a significant effect on the job 
choices of Ss. There were no significant 
interactions, 

The magnitudes of the various significant 
factor effects were assessed by estimating 
the “main effect contrasts” between the states 
of these factors. These contrasts appear in 
Table 4. The contrasts of location, type-of- 
work, and size-of-company factors were each 
expressed as a proportion of the salary con- 
trast. The differences between the two states 
of these factors were then expressed in 
monetary terms by taking the appropriate 
proportions of the $800 salary differential. 

The contrast between the two states of the 
salary factor, .486925, was the largest of the 
main effect contrasts. Of course, the higher 
salary state was preferred. The contrast be- 
tween the Winston-Salem and South Bend 
locations was next largest. The preferred 
state of this factor, Winston-Salem, was 
worth about $639 per year. The contrast be- 
tween the two states of the type-of-work 
factor was third largest. Management trainee, 
the preferred type of work, was worth about 
$468 per year. The contrast between the two 
states of the size-of-company factor was the 
smallest of the main effect contrasts. The Ss 
preferred the small company with growth 
potential. Working in a small company was 
worth about $165 per year. 


DiIscussION 


The salary differential of $800 had more 
influence on the job choices of business 
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TABLE 3 
ANALYSIS OF VARIANCE FOR FACTORIAL AND PATRED-COMPARISON MODEL 
Conventional SS : 
Source df (adjusted)® Adjusted SS x2b Fe 

A 1 605984 & 8 4.8479 547.81 143.855* 
B 1 948381 8 7.5870 857.33 225.134* 
GC 1 324612 X 8 2.5969 293.45 77.059* 
D 1 .040320 * 8 3226 36.45 9.573* 
AB 1 .000012 « 4 .0000 .00 
AC 1 .001004 6 .0060 .68 
AD 1 .003404 4 .0136 1.54 
BE 1 .000566 X 8 0045 50 
BD 1 .000632 8 0051 58 
CD 1 .002139 & 4 .0086 97 

Residual 5 0337 3.8081 

Paired-comparison 15 15.4259 1743.13 

model 
Error 33 7938 89.70* 
Total 48 16.2197 








Note.—N = 113. 
® These adjustments are necessary when the affective values 
design of pee study (see Bock & Jones, 1966, Ch. 7). 
N X Adjusted SS. 





are estimated using the balanced incomplete paired-comparison 


° Khe residual mean-square is the error term of these F statistics. 


mpe< 7.0 


administration seniors than any of the other 
three factors as they were defined in this 
study. It would seem that a company seeking 
high caliber graduates would greatly enhance 
its chances of success by offering high start- 
ing salaries, and that high salaries would do 
much to offset a company’s less favorable 
position with regard to location, type of 
work, or company size. 

The Ss showed a significant preference for 
a job location near their homes and the 
University. Their preference for this location 
was stronger than their preferences for both 
the management-trainee type of work and 
the small company with growth potential. 
The students’ preference for Winston-Salem 
may be indicative of a preference for “the 


Southern way of life.” This factor might not 
have proven to be as important if the factor 
states had been a city in North Carolina and 
another southern city the same distance from 
the Piedmont, North Carolina area as South 
Bend. 

The students’ use of starting salary and 
job location as the two most important cri- 
teria in assessing the “job offers” contradicts 
the usual advice of experienced businessmen 
and college-placement personnel concerning 
the choice of a job. Such advice usually 
stresses the importance of finding work for 
which one is suited and the importance of 
opportunity and challenge in a job. At this 
point in their careers, students seem to be 
able to comprehend and appreciate the $800 


TABLE 4 


Conrrasts or LOCATION, TyPE OF WoRK, AND SIzE oF CoMPANy Factors, EXPRESSED 
AS PROPORTION, AND IN MONETARY VALUE 














Proportion of Monetary 
Factor Contrast State monetary differential value 
A 389225 Winston-Salem—South Bend -799353 $639.20 
B 486925 $800 salary differential 1.000000 $800.00 
€ .284875 Management trainee—Sales trainee -585049 $468.00 
D .100499 Small company—Large company 206191 $164.80 
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salary differential and the difference between 
living near home and living geographically 
distant from home more fully than the dif- 
ference between the two types of work or the 
difference between working for a large or 
small company. This may be due in part to 
their limited work experience. 

Although a sales job has the possible ad- 
vantages of greater independence and more 
rapid career progression, Ss in this study saw 
the management-trainee position to be de- 
cidedly more desirable than the sales job. 
This finding and the reports of company re- 
cruiters (Ricklefs, 1964) seem to indicate 
that the sales field has a tarnished image 
in the minds of many of today’s college 
seniors. 

The lesser importance of type of work in 
this research may have been due in part to 
the way in which this factor was defined. 
The two states of this factor were both in 
the general field of business administration. 
If one of the states had not been in the 
business field, this factor undoubtedly would 
have been of greater importance. 

According to Habbe’s (1962) survey, per- 
sonnel directors feel that small companies can 
recruit successfully if they stress the special 
advantages of their firms. The results of the 
present study strongly support this conten- 
tion. In fact, the smaller companies seem to 
have a slight advantage in recruiting UNC 
business graduates. 

The results of this study could enable 
companies planning to recruit UNC business 
graduates to assess the strength of their re- 
cruiting position. Companies which are in an 
unfavorable position in regard to one or more 
of the job factors could attempt to offset this 
disadvantage by increasing their salary offers 
according to the estimated value of the favor- 
able states of the factors involved. 

The present research concentrated on the 
job preferences of a small segment of the 
student population. The criteria used by dif- 
ferent types of students in assessing a job 
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may differ substantially. The same research 
technique used here could be used in further 
research to determine the extent and direction 
of these differences. 

It might be interesting to study the extent 
of individual differences in job preferences. 
If these differences proved to be significant, 
any generalizations as to the job preferences 
of large categories of students might be mis- 
leading. Perhaps it would be more practical 
for recruiters to obtain measures of the job 
preferences of individual students. Knowl- 
edge of these individual preferences would 
enable companies to narrow their recruiting 
efforts to those students who hold the appro- 
priate job values. This would save the time 
and energy of both students and recruiters. 
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A CLOSER LOOK AT LEVEL OF ASPIRATION AS A 
TRAINING PROCEDURE: 


A REANALYSIS OF FRYER’S DATA? 


EDWIN A. LOCKE 


American Institutes for Research, Washington Office 


On the basis of a study of Morse Code learning, Fryer (1964) claimed sup- 
port for his hypothesis that having Ss set levels of aspiration would lead to 
a higher performance level than giving knowledge of score alone. The present 
writer reanalyzed Fryer’s data to test the hypothesis that the superiority of 
the level-of-aspiration procedure would depend upon the level at which the 
goals were set. In 3 out of 4 comparisons it was found that Ss who set high 
goals performed better than Ss who set low goals and better than Ss given 
knowledge of score alone. There were no significant differences between Ss 
who set low goals and Ss given knowledge of score alone. A qualification of 
Fryer’s hypothesis, taking account of these facts, was therefore proposed. 


In a study of Morse Code learning, Fryer 
(1964) found that requiring subjects (Ss) to 
set levels of aspiration before each trial en- 
hanced their performance as compared with 
Ss given knowledge of their scores but not 
required to set aspiration levels. This effect 
was found only for Ss working on high dif- 
ficulty code letters. Fryer thus claimed sup- 
port for his hypothesis that: ““The mean per- 
formance scores of the group exposed to the 
procedure of level of aspiration will be sig- 
nificantly greater than the group having 
knowledge of results alone [p. 16].” However, 
the present writer felt that this interpretation 
lacked one important qualification: that this 
effect would depend upon the Jevel at which 
the goals were set. For instance, if the level- 
of-aspiration (LA) Ss set very easy goals, 
the superiority of their performance to that 
of the knowledge-of-results-alone group might 
be negligible. On the other hand, if they 
set their goals at a very high level, their 
superiority in performance might be consider- 
able, both for hard and easy tasks. Fryer’s 
implicit premise was apparently that the 
goals would be set at a reasonably high level 
in the LA group, but the present writer felt 
that such a premise was not necessarily war- 
ranted. In order to illustrate the importance 


1 This research was supported by Contract Nonr 
4792(00) between the Office of Naval Research and 
the American Institutes for Research. The opinions 
expressed are not necessarily those of the Department 
of the Navy. 
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of the level at which the goals are set in 
determining performance level, the writer re- 
analyzed Fryer’s raw data (presented in Ap- 
pendix C of his book) in such a way that 
the effect of the level of the goals could be 
assessed. 


METHOD 
Original Design 


Fryer’s (1964) original design included three 
major variables dichotomized as follows: high- 
versus low-code letter difficulty (ie., hard versus 
easy task); “hope” versus “expect” instructions in 
setting LA; and “public” versus “private” expression 
of LA. There were 10 Ss per cell. In addition, 10 Ss 
were given the hard task and knowledge of score, 
and 10 Ss were given the easy task and knowledge 
of score, but neither of the latter groups set LAs. 


Reanalysis 


In-the reanalysis by the present writer, mean 
improvement scores were used as the dependent vari- 
able. They were calculated by subtracting each S’s 
mean score on the first two trials (which were 
practice trials) from his mean score on the last 13 
trials. (No LAs were set by the goal-setting groups 
until Trial 3.) 

There were no noteworthy effects of the “public” 
versus “private” condition so the public and private 
groups were combined for all analyses. This yielded 
six groups (the letters correspond to Fryer’s, 1964, 
labels in Appendix C): Group A-B (N = 20): expect 
instructions, easy task; Group C-D (N=20): ex- 
pect instructions, hard task; Group E-F (N = 20): 
hope instructions, easy task; Group G-H (N = 20): 
hope instructions, hard task; Group I (N=10): 
knowledge of score only, easy task; Group J 
(N = 10): knowledge of score only, hard task. 
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TABLE 1 
MEAN IMPROVEMENT SCORES BY CONDITION AND LEVEL OF ASPIRATION 
Mean improvement score 
G N Conditi oe 
ZOUD Oe : versus low) 
All subjects High LA Low LA 
subjects subjects 
A-B 20 Expect, easy task ial 4.26 6.31 1.94 5.13% 
E-F | 20 | Hope, easy task te | POSE 5.62 3.16 72 
I 10 KR only, easy task 3.72 —_ — — 
C-D 20 Expect, hard task eal 4.16 4.68 1.42 4.30* 
G-H | 20 | Hope, hard task 5.27 pe ee ee 7.16 3.38 4.80* 
i 10 KR only, hard task 1.74 — — — 
*p < .001. 


Each of the first four groups above was dichoto- 
mized into a “high level of aspiration” (High 
LA) group and a “low level of aspiration” (Low 
LA) group as follows: for each S his mean practice 
score (on the first two trials) was subtracted from 
his mean LA on the last 13 trials. In each group the 
10 Ss having the highest positive goal discrepancies 
(ie., the largest positive difference between their 
mean aspiration level and their mean practice score) 
were classified as High LA Ss and the 10 Ss having 
the lowest positive goal discrepancies were classified 
as Low LA Ss. 


RESULTS 


The fourth column of Table 1 shows the 
mean improvement scores of each of the 
groups described above before the breakdown 
by aspiration level. It can be seen that the 
mean improvement score for all hard task, 
LA Ss was significantly greater than that of 
the corresponding knowledge-of-results group 
(Groups C-D and G-H versus Group J). 
However, the corresponding difference 
(Groups A-B and E-F versus Group I) for 
the easy task Ss was not significant. This 
replicates Fryer’s (1964) original finding (in 
which total scores controlled for initial level 
were used) that the LA procedure worked for 
the hard task but not for the easy task. 

The fifth and sixth columns of Table 1 
show the improvement means for each High 
LA and Low LA subgroup. In all cases the 
High LA means are substantially higher than 
the Low LA means, and in three out of 
four cases these mean differences are signifi- 
cant at the .001 level (see last column of 
Table 1). These mean differences were not 
an artifact of differences in initial (practice) 


scores, since these had very little relationship 
to mean improvement scores. In addition, 
even these small corrections were made on 
the means before computing the ¢ values. 

Alternatively it might be argued that the 
mean differences between High and Low LA 
groups were spurious, in that the aspiration 
level differences could be a result rather than 
a cause of performance level differences. In 
other words, one might argue that Ss who im- 
proved the most set higher aspiration levels 
as a result of their high performance rather 
than performing at a high level as a result 
of setting high aspiration levels. However, 
this possibility could also be tested. For each 
pair of trials, beginning with Trials 2 and 3, 
the difference between each S’s LA on the 
succeeding trial and his score on the pre- 
ceding trial was computed, as well as the 
difference between his score on the succeeding 
and preceding trials. If changes in LA pre- 
ceded changes in performance, when the LA 
exceeded the score on the previous trial, per- 
formance should have increased over the pre- 
vious trial. When the LA dropped below the 
score on the previous trial, performance 
should have decreased from the previous 
trial. Thus for each pair of trials a mean 
improvement score was calculated for Ss 
whose LA exceeded their previous score and 
for Ss whose LA was less than their previous 
score. The mean improvement score for the 
former group should have been greater than 
the mean improvement score of the latter 
group, if LA was functioning as a determinant 
of performance. 
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TABLE 2 
t RATIOS FOR IMPROVEMENT MEANS OF KNOWLEDGE-OF-RESULTS GROUPS VERSUS 
CORRESPONDING LEVEL-OF-ASPIRATION SUBGROUPS 
Knowledge-of-score- Level-of-aspiration group 
only group 

A-B (High LA) A-B (Low LA) E-F (High LA) E-F (Low LA) 

I (easy task) ¢ 2.61* — 1.628 1.60 —0.52 
C-D (High LA) C-D (Low LA) G-H (High LA) G-H (Low LA) 

J (hard task) ¢ Sal jieea —0.30 6:/67** 1.58 





a Minus sign indicates LA group mean was lower than that of corresponding knowledge-of-score group. See Table 1 for 


actual means. 
* 


The mean improvement scores were in the 
predicted direction for 11 of the 13 pairs of 
trials in Group A-B, for 13 of 13 pairs in 
Group C-D, and for 10 of 13 pairs in Group 
G-H (no comparisons were made for Group 
E-F, since the effect of LA was not signifi- 
cant). This would seem to support the as- 
sumption that LA was a cause rather than 
a result of level of performance. 

Table 2 shows the results of ¢ tests on the 
improvement. means of each knowledge-of- 
score group versus the mean of each High and 
Low LA subgroup that worked on a task of 
the same difficulty. The mean for Group I 
(easy task) is significantly lower than that 
of the High LA Ss from Group A-B, but 
the corresponding difference for the High LA 
Ss from Group E-F does not reach signifi- 
cance. In neither case is there a significant 
difference between Group I and the two Low 
LA subgroups. The mean for Group J (hard 
task) was significantly lower-than the means 
of the High LA Ss from both Groups C-D 
and G-H. In neither case was the correspond- 
ing difference significant for the Low LA 
subgroups. 

To summarize, the knowledge-of-score 
group means did not differ significantly from 
the means of any of the Low LA subgroups 
but were significantly lower than the means 
of three of the four High LA subgroups. 


DISCUSSION 


The hypothesis that the effects of goal 
setting as compared to knowledge of score 


alone would be dependent upon the level at 
which the goals were set was strongly sup- 
ported by the reanalysis of Fryer’s data. The 
higher the level of the goals in relation to 
the individual’s initial ability, the greater the 
improvement. Thus, a blanket statement to 
the effect that goal setting is more effective 
than knowledge of score alone would not seem 
to be justified unless the level at which the 
goals are set is specified. In addition, it would 
be useful to determine what the goals of the 
knowledge-of-score Ss are in cases where the 
two types of groups are compared. Knowl- 
edge-of-score groups could not, of course, 
be asked to set levels of aspiration, but they 
might be given postexperimental goal ques- 
tionnaires (as was done by Locke & Bryan, 
1966, with highly successful results), in order 
to determine what their “implicit” goals were. 

These results also suggest a possible reason 
why Fryer did not obtain significant perform- 
ance effects for the easy code Ss. Looking 
at Table 1 it can be seen that the improve- 
ment mean of the knowledge-of-score group 
given the easy code was greater than that of 
the corresponding group given the hard code 
(3.72 versus 1.74). However, the mean im- 
provement scores of the LA Ss were about the 
same for the easy and hard codes (4.26 and 
4.16, respectively); thus, it is possible that 
the differential effects of goal setting were 
a result of differences in the motivation of 
the control (knowledge-of-score) groups 
rather than in the differential effectiveness of 
the goal-setting procedure on hard and easy 
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tasks. This interpretation seems plausible 
also in view of the fact that goal setting was 
found to have a positive effect among one 
of the two easy code groups, suggesting that 
there is no intrinsic reason why goal setting 
cannot be effective on both hard and easy 
tasks. 

In closing it should be noted that the ef- 
fects of LA on level of performance as demon- 
strated here are quite congruent with recent 
findings in the area. Four studies by Locke 
(1966) and Locke and Bryan (1966) have 
found strong relationships between the level 
of the goal and performance level with high 
goals leading to a higher level of performance 
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than low goals. The present analysis was a 
further demonstration of the importance of 
determining the level at which goals are set 
when studying the effects of goal setting. 
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RELATIONSHIP OF VARIOUS COLLEGE GRADUATE 
CHARACTERISTICS TO RECRUITING DECISIONS 


STEPHEN J. CARROLL, JR. 


Department of Business Administration, University of Maryland 


19 personal and biographical characteristics of business school graduates of 
the University of Minnesota in 1961 were related to several criteria repre- 
senting success in the campus-recruiting process. Of the characteristics studied, 
only appearance rank (handsomeness), marital status, and office experience 
were found to be significantly related (p< .05) to any of the 5 criteria rep- 
resenting student job-seeking success. The findings of the study are contrary to 
the findings of several surveys of campus recruiters and firms with respect to 
the relative weight assigned to various student characteristics in selection 


decisions. 


A number of studies have attempted to 
identify the characteristics of college gradu- 
ates that are emphasized by campus recruiters 
and companies in selecting graduates for jobs 
at the entry level of management. Informa- 
tion resulting from such studies should be of 
value in advising students, in suggesting the 
assumptions that campus recruiters and other 
company personnel have about the predictors 
of managerial effectiveness, and in evaluating 
college recruiting itself. 

Most of these studies consisted of mail 
surveys and were focused on the company. 
Dickinson (1955), in a study of this type, 
received a return of 66% from 639 employ- 
ers. He found that intelligence was ranked 
highest and physical traits (looks and bear- 
ing) lowest in a ranking of the importance 
of seven factors to success in several fields 
of management. These results were quite simi- 
lar to those found by Habbe (1948, 1956) 
in surveys of 126 and 240 companies, respec- 
tively. Habbe also found in his two studies 
that interview impression and grades were 
ranked highest and employment experience 
and test scores lowest in rankings of various 
ways of evaluating college graduate job ap- 
plicants. Johnson (1956), who received a 
64% return from 341 companies, reported 
that 92% of his respondents indicated that 
high grades and participation in extracur- 
ricular activities were important considera- 
tions in selecting college graduates. A survey 
of 21 companies by Sullivan (1961) also indi- 
cated that academic grades and extracurricu- 
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lar activities were important considerations in 
the selection of college graduates. 

A few studies were directed at the campus 
recruiter instead of the company. Johnson 
(1956) compared interviewer ratings of 
college senior job applicants on several traits 
(verbal fluency, tact, initiative, etc.) to over- 
all rankings of candidate suitability. No defi- 
nite relationships emerged from the study but 
there were indications that his interviewers 
did not rate the applicants carefully. Odiorne 
and Hann (1961) studied interviewers at the 
University of Michigan and found that they 
emphasized grades, age, and maturity in their 
evaluations of seniors. Appearance, extra- 
curricular activity participation, and work 
experience were less important. 

The present study described in this report 
was conducted at the University of Minnesota 
in 1961 and was directed at the student seek- 
ing the job rather than the campus inter- 
viewer or company as in previous studies. 
In this study, various personal and biographi- 
cal characteristics of graduating seniors were 
correlated with several criteria representing 
their success in obtaining visit and job offers 
from companies, Since these correlations re- 
flect actual student experience, it was felt 
that they would be a more accurate indicator 
of the weights given to various student 
characteristics than surveys of companies and 
campus interviewers. 


METHOD 


Subjects. The group studied consisted of all 211 
male seniors in the School of Business Administration 
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TABLE 1 


RELATIONSHIP OF VARIOUS PERSONAL AND BIOGRAPHICAL CHARACTERISTICS TO CRITERIA REPRESENTING 


THE JOB-SEEKING Success or MINNESOTA BusINESs ScHOOL GRADUATES 


No. job 





No. visit Visit-Inter-  Job-Visit Combined 
Characteristic offers offers view ratio ratio criterion 
Appearance rank (handsome is +) + .26 +-.30 +-.27 
Marital status (married is +) +.09 
Office experience (yes is +) +.20 +.18 


of the University of Minnesota who graduated in 
December of 1960 and in March, June, July, and 
August of 1961. Criteria data for this group were 
obtained by means of postcard questionnaires sent 
to their homes after graduation. A previous pilot 
study had indicated that these graduates could 
remember all their interviews, visit offers, and job 
offers. Two follow-up questionnaires were also used. 
Of the 158 questionnaires returned, 24 reported no 
interviews taken and 3 were incorrectly filled out 
for a 75% overall response and a 70% usable 
response, While approximately 30% of the graduates 
had one or no visit offers, 25% had six or more. 
About 25% of the graduates received no job offers, 
and 25% had three or more. 

A comparison of respondents and nonrespondents 
indicated that these two groups did not differ signifi- 
cantly (chi-square, p > .05) on any characteristic 
found to be related to a criterion of job-seeking 
success. It was also found that 11 nonrespondents 
contacted by telephone did not differ significantly 
(chi-square, » > .05) from the respondents on any 
of the five criteria used in the study. 

Preselection was not a problem in this study 
except for major field. An investigation indicated 
that only 4 out of almost 200 companies preselected 
interviewees on the basis of grades. This involved a 
total of about two dozen interviews. 


Characteristics Studied. The characteristics of 
college seniors investigated in this study were: 
1. Age. 


2. Appearance rank (handsomeness). 
3. Grades. 

4. Height. 

5. Major field. 
6. Marital status. 
7. Membership in a college fraternity. 

8. Military status (1A, 4F, etc.). 

9. Nonretail-sales experience (direct selling). 

10. Number of leadership positions in college. 

11. Number of memberships in college organiza- 


12. Office-work experience. 

13. Production-line experience. 

14. Retail-sales experience. 

15. Supervisory experience. 

16. Type of work preference. 

17. Wearing of glasses. 

18. Weight. 

19. Weight-height ratio (fat-thin). 


Data pertaining to each of these characteristics 
were taken from placement data sheets on file in 
the School of Business Administration. These place- 
ment data sheets are filled in by each person gradu- 
ating in a given year. 

Appearance in this study was defined as relative 
handsomeness as evaluated by three judges from 
photographs of the graduates on their placement 
data sheets. The judges ranked the photographs from 
most to least handsome using the Bittner and Rund- 
quist (1950) ranking method. The average ranking 
of these judges determined the final appearance 
score. Interrater reliability was .73 as tested by the 
Horst (1949) method. Test-retest reliability coef- 
ficients for the appearance rankings were calculated 
for a sample of 20 graduates judged on two oc- 
casions 6 weeks apart. These reliability coefficients 
were +.97, +.92, and +.87 (rank order). 

Criteria of Job-Seeking Success. The criteria used 
in this study were: number of job offers received, 
number of visit offers received, number of job offers 
received as a proportion of number of visit offers 
accepted, number of visit offers received as a propor- 
tion of number of interviews taken, and these four 
criteria combined into one measure. 

A distinction was made between visit offers re- 
ceived and job offers received under the assumption 
that campus recruiters mainly influence visit offers 
and other company managers influence job offers. 
Each student has to impress both the recruiter who 
comes to the university and other managers back 
at the company. 

Five criteria of job-seeking success were used in 
the study because of certain difficulties involved in 
measuring the relative success of students in the 
campus recruiting process. Some students take many 
interviews and accept many visits in order to maxi- 
mize the total number of visit and job offers they 
receive. For this reason, visit offers as a proportion 
of the number of interviews taken and job offers 
as a proportion of the number of visit offers ac- 
cepted were used as criteria. On the other hand, 
students who only interview and accept visits from 
companies who are very likely to make them a job 
offer or visit offer would score very high on these 
latter two criteria. Thus no single criterion was en- 
tirely satisfactory. A combined criterion was con- 
structed by comparing a student’s success on each 
criterion with the median success of all students 
on that criterion and scoring above the median 


RECRUITING OF COLLEGE GRADUATES 


as 1 and below the median as zero. Scores on the 
combined criterion measure could range from 0 to 4. 
Intercorrelations among the various criteria used 
ranged from .34 to .68 (Pearsonian). 


RESULTS 


Of the 19 characteristics of seniors studied, 
only 3 were significantly related (chi-square, 
p < .05) to any of the five criteria used. Ap- 
pearance rank was related to three criteria, 
marital status to one, and office experience to 
two. A correlation ratio (eta) was calculated 
for each significant relationship and these are 
presented in Table 1. The direction of 
each of these relationships (determined by 
inspection) is also indicated. 

An investigation was made of the inter- 
relationships among these three character- 
istics since the relationship of one to a 
criterion could be a function of its relation- 
ship to another characteristic. None of the 
interrelationships was significant (chi-square, 
pee05). 


DISCUSSION 


Only 3 of the 19 characteristics studied 
were significantly related to any criterion of 
job-seeking success and these characteristics 
did not have a strong relationship to any 
criterion. Thus the variables studied ac- 
counted for only a small amount of the vari- 
ance in the campus job-seeking success of 
Minnesota business school graduates in 1961 
in spite of the fact that a fairly large number 
of obvious and often mentioned character- 
istics were studied. 

Previous surveys of companies and recruit- 
ers indicated that intelligence and grades 
were very important determinants of job- 
seeking success. These findings were not sup- 
ported in the present study (which focused 
on correlates of decisions actually made 
instead of surveys) since grades were not 
significantly related to any criterion of job- 
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seeking success.’ In addition, contrary to at 
least one previous study, extracurricular ac- 
tivity participation, leadership activities, and 
major field were not related to job-seeking 
success. In several previous studies appear- 
ance was found to be relatively unimportant 
in college recruiting decisions. In this study, 
appearance turned out to be the most 
important characteristic of those studied. 





1Tt is quite possible that recruiters and compa- 
nies, when mentioning the importance of high grades, 
are referring to grades above 3.00 and do not take 
into consideration the differences between grade- 
point averages below this figure. There is some 
indication of this in the data collected in this study. 
Since only a small proportion of business school 
graduates receive a 3.00 grade-point average or 
higher and since many of these go on to graduate 
school, few students with very high grades become 
involved in the college recruiting process at the 
undergraduate level. 
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This study investigated the job attitudes of lower-middle managers in relation 
to their scores on a self-perception personality instrument. 456 managers 
from 3 companies filled out both a job-attitude questionnaire and a forced- 
choice self-description questionnaire. The attitudes of the 89 respondents 
(“Highs”) who described themselves most like top managers were compared 
with the 89 respondents (“Lows”) who described themselves most like lower- 
level managers, Results showed that the Highs were significantly more satisfied 
and also that they placed significantly more emphasis on the necessity for 
inner-directed behavior in their jobs. Results were compared with previous 


job-attitude studies of managers. 


In a recent publication, Vroom (1965) 
made the following statements: 


The most serious omission in studies of managerial 
satisfaction is represented in the neglect of person- 
ality variables in existing research .... It would 
seem that, if we are to make real progress in under- 
standing differences in managerial satisfaction, we 
must reject the assumption that these differences 
are attributable solely to job content or work en- 
vironment and start looking at, and incorporating 
into our research, individual differences among man- 
agers in motives and abilities . . . . Explanations of 
managerial satisfaction must take into account both 
the characteristics of the role occupied by the 
manager and the personality which he brings to it. 
It seems likely that we will find the effects of 
dimensions of the managerial role on satisfaction 
will vary markedly with differences in the personal- 
ity of its occupant. Similarly, the relationship be- 
tween personality variables and job satisfaction will 
vary with the characteristics of the job being 
performed [p. 57]. 


The purpose of the present study was to 
investigate one of Vroom’s assumptions, 
namely that the managerial role will have 
different effects on job attitudes depending 
on the personality traits of the managers. 
If the assumption is true, it follows that 
managers at one specific level of management 
who differ from each other in personality 
traits should be differentially satisfied by 
their jobs. i 

1This paper is based on a doctoral dissertation 
submitted to the Graduate Division of the Univer- 
sity of California, Berkeley. The author wishes to 
express his appreciation to Lyman W. Porter, chair- 


man of the dissertation committee, for his invaluable 
assistance at every stage of the study. 


In a series of studies, Porter (1961, 1962, 
1963, 1964; Porter & Henry, 1964) has 
demonstrated the existence of a relationship 
between the level of management and certain 
attitudes toward the managerial job. The 
first in the series (Porter, 1961) was a pilot 
study done on middle- and lower-level mana- 
gerial positions. The results indicated that 
level of management was related to the 
amount of perceived satisfaction of psycho- 
logical needs in the managerial position. 
Middle managers reported higher satisfaction 
of the needs for esteem, security, and 
autonomy than did lower managers. 

A second study (Porter, 1962, 1964) was 
done on a national sample of nearly 2,000 
managers representing all managerial levels. 
Differences among the various levels were 
found, both in the amount of the perceived 
fulfillment of psychological needs, and in the 
perceived opportunity to satisfy these needs. 
Esteem, autonomy, and _ self-actualization 
were considered to be more fulfilled and more 
satisfied by higher than by lower managers, 
and this trend was consistent from presidents 
down to first-line supervisors. 

On the same sample, the perceived impor- 
tance of needs was reported (Porter, 1963). 
The higher the management level, the more 
important the needs were considered to be 
for the job, and in particular the needs for 
autonomy and self-actualization. Finally, the 
perceived importance of certain personality 
traits was investigated (Porter & Henry, 
1964). The higher the level, the more that 
inner-directed traits and the less that 
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“organization man” traits were regarded as 
important for success in the job. 

The results of the Porter studies have 
shown that when dealing with job attitudes 
of managers, one must be aware of differ- 
ences within the managerial class. Level of 
management appears to be one variable which 
differentiates individuals in their perception 
of the environment. 

The present study goes one step further. 
The problem to be investigated is the follow- 
ing: Are environmental factors the only ex- 
planation for differences in managerial atti- 
tudes, or do other factors exist to account 
for these differences? Can the perceived dif- 
ferences in need fulfillment and satisfaction, 
the importance of needs and the importance 
of personality traits be accounted for merely 
by differences in positions in the managerial 
hierarchy? If the answer were affirmative, 
then it could be assumed that the differences 
in job attitudes are a function of organiza- 
tional variables alone. If, on the other hand, 
systematic differences in attitude can be 
found within a certain managerial level, and 
these differences can be related to some other 
factor, for example, self-perceived personality 
traits, then additional explanations will have 
to be suggested. 

The basic technique of the study was as 
follows: in a sample of lower-middle man- 
agers, individuals were divided into three 
groups by their self-perceived personality 
traits: a “High” group, consisting of indi- 
viduals whose self-perceived personality traits 
are similar to those of high-level managers; 
a “Low” group, whose self-perceived person- 
ality traits are comparable to self-perceived 
traits of low-level managers; and a middle 
group which is between the two extremes as 
far as self-perceived traits are concerned. 

It was hypothesized that the job attitudes 
of those managers who personality-wise re- 
semble high-level managers will differ from 
the job attitudes of managers who are similar 
to low-level managers in their self-perceived 
traits. The differences between the job atti- 
tudes of the two groups will be of the same 
kind as the differences between the job atti- 
tudes of high-level and low-level managers. 
Specifically, the following four hypotheses 
were tested: 


425 


Hypothesis I. Managers who score high on 
a scale for the measurement of personality 
traits (Highs) will tend to perceive a higher 
degree of fulfillment of psychological needs 
than do low-scoring managers in the same 
level (Lows). The difference will be es- 
pecially pronounced in the needs for esteem, 
autonomy, and self-actualization. 

Hypothesis II. The Highs will tend to 
perceive more job satisfaction than the 
Lows, particularly in the needs for esteem, 
autonomy, and self-actualization. 

Hypothesis III, The Highs will tend to 
perceive psychological needs more important 
than the Lows, especially so for the needs of 
autonomy and self-actualization. 

Hypothesis IV. The Highs will tend to 
place more emphasis on inner-directed traits 
as important in their jobs, than will the 
Lows. 


METHOD 
Measurement Devices 


Personality traits were measured by the Self- 
Description Inventory (SDI) developed by Ghiselli 
(1954, 1955, 1956a, 1956b, 1959, 1963a, 1963b). The 
SDI consists of 64 pairs of personally descriptive 
adjectives presented in forced-choice form. Half 
of the pairs described socially desirable traits, and 
the subject (S) is requested to choose the word 
which he thinks most describes him. The other 
half are socially undesirable traits, and S in each 
pair checks the word which he thinks Jeast describes 
him. Of the eight scales developed to date, four 
have consistently shown a relationship with various 
occupational levels: Intelligence, Supervisory Ability, 
Perceived Occupational Level, and Decision Making 
Approach. 

Job attitudes were measured by Parts I and 
II(a) of the Porter Management Position Question- 
naire (MPQ). The MPQ has been described in 
detail in the Porter studies. Briefly, Part I contains 
16 items, 13 of which were classifiable into a 
Maslow-type need hierarchy system (Porter, 1962). 
The 13 items are categorized into five needs, namely, 
Security, Social, Esteem, Autonomy, and _ Self- 
Actualization needs. In addition, three items deal 
with specific questions concerning pay, feeling of 
being informed, and feeling of pressure in the job 
(this last will not be dealt with here). On each of 
the 16 items three questions were asked: 


a. How much of the characteristic is there now 
connected with your management position? 
b. How much of the characteristic do you think 
should be connected with your management 
position? 

c. How important is this position characteristic 
to you? [p. 376] 
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The answer to each question can be rated by S on 
a scale from 1 (low) to 7 (high). Question a (“How 
much is there now?”) measures perceived fulfill- 
ment, Question b (“How much should there be?”) 
—perceived expectation, and Question c (“How im- 
portant is this to me?”)—perceived importance for 
each of the specific characteristics. A satisfaction 
score can be computed for each S by subtracting 
the score on the perceived fulfillment from the 
score on the amount of expectation for the same 
item. Thus, a O score would indicate the highest 
satisfaction (fulfillment equals expectation), and the 
lowest satisfaction would be expressed by a score 
of 6 (lowest fulfillment with highest expectation). 

In Part II (Porter & Henry, 1964), S is re- 
quested to rank a list of 12 traits, according to the 
importance of each trait for success in his present 
management position. Two of the adjectives, Ef- 
ficient and Intelligent, are irrelevant camouflage 
items, and the remaining 10, although presented in 
a random order, can be categorized under two 
headings; 5 indicate “inner-directed” traits (Forceful, 
Imaginative, Independent, Self-Confident, Decisive), 
and the remaining 5 are “other-directed” (Coopera- 
tive, Adaptable, Cautious, Agreeable, Tactful). 

In addition to these various questions, each S was 
asked a series of personal data questions (e.g., job 
title, department, age, education, etc.) at the end 
of the questionnaire. 


Procedure and Sample 


The data on which the study is based were 
obtained from a sample of lower-middle manage- 
ment personnel employed by three large, nationally 
known, manufacturing companies. Each manager 
was asked to fill out both the SDI and the MPQ. 
An interval of at least 2 weeks separated the com- 
pletion of the two forms. All forms were sent to 
the respondents through United States Mail, and 
returned in an attached self-addressed and _pre- 
stamped envelope. Half of the managers received 
the MPQ first, the other half received the SDI 
first. Out of 561 respondents 501, or 89.3%, returned 
at least one of the forms, and 456, or 81.3% filled 
out both of them. 

In constituting the two groups of Highs and 
Lows, two requirements were considered as neces- 
sary: (a) the personality traits which differentiated 
between the two groups should be related to mana- 
gerial success, as measured in other samples; (b) the 
two groups should be similar in demographic aspects 
such as age, education, and income and also in 
certain organizational variables, that is, line and 
staff positions and employing companies. 


RESULTS 


The variable on which the two groups were 
differentiated was the Decision Making Ap- 
proach (DMA) scale from the SDI. Ghiselli ? 
has found that the mean score for top man- 


2 Personal communication. 
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agement personnel was significantly different 
from the mean score of middle managers. 
Also, satisfactory validity coefficients in sev- 
eral samples have been found between the 
DMA scores and ratings by superiors: .36 
for 88 insurance supervisors and agents, .65 
for 10 middle managers, and .58 for another 
sample of 12 middle managers. 

Out of the 456 managers who completed 
both the MPQ and the SDI, 3 considered 
themselves to be members of top manage- 
ment, and were deleted from the sample. Of 
the remaining 453 Ss, 338 were found to 
represent the lower-middle layer of manage- 
ment of the three companies.’ 

They were all at a level above first-line 
supervisors and below the midpoint of the 
managerial hierarchy. Of the 338 lower- 
middle managers 89 or about 26% had a 
score of 25 or above on the DMA scale, and 
the same number of managers scored 18 or 
below on the same scale. These cutoff points 
are equivalent to the seventy-first and 
twenty-sixth percentiles, respectively, accord- 
ing to Ghiselli’s management norms. No sig- 
nificant differences between the two groups 
were found for the pertinent demographic 
variables of age, education, and income. 
Also, the two environmental variables of em- 
ploying company and type of position (line 
and staff) were not significantly different for 
the two groups. 

Table 1 presents the means, standard 
deviations, and levels of significance for the 
perceived fulfillment, satisfaction, and im- 
portance of needs and the perceived impor- 
tance of inner-directed traits for the two 
groups. 

Overall, one sees from Table 1 that the 
Highs on the DMA scale feel that they re- 
ceive more fulfillment and satisfaction from 
their jobs and they attach more importance 
to psychological needs. In addition they at- 
tach more importance to inner-directed per- 
sonality traits as being necessary for success 
in their job positions. 








3An Index of Management Level. was used to 
define a manager’s position in the hierarchy. By 
this index, employees can be grouped on the basis 
of their relative distance up the management 
hierarchy, regardless of size of company (Porter, 
1962). 
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In a more detailed fashion, lower-middle 
managers who are high on the DMA scale 
are characterized by the following job atti- 
tudes relative to those who are low on this 
scale: (a) They feel that they receive signifi- 
cantly more fulfillment of security, esteem, 
autonomy, and self-actualization needs. (b) 
They are significantly more satisfied than the 
Lows in the needs for security, autonomy, 
and self-actualization; they are also some- 
what (but not significantly) more satisfied 
in the esteem need. (c) They perceive esteem, 
autonomy, and self-actualization needs as 
being significantly more important to them- 
selves than to the Lows. (d) They perceive 
their own positions to require significantly 
greater inner-directed behavior when com- 
pared to the Lows. 

Of the 16 comparisons for which hypothe- 
ses were made, 5 for each of the areas of 
fulfillment, satisfaction, and importance of 
needs, and 1 for the inner-other directed 
dimension, 12 were predicted by the 
hypotheses, 

The two groups consist of managers in the 
lower-middle echelon of management in three 
nationally known companies. By the Index 
of Management Level, they represent a con- 
fined layer of management, starting one 
position above first-line supervisors, yet they 
describe themselves and their jobs quite 
differently from each other. The Highs are 
managers who perceive themselves, measured 
oy the DMA scale, as being relatively more 
capable, determined, industrious, resourceful, 
and enterprising than do the Lows. They con- 
sider themselves to be more sharp-witted 
and sincere but also more sociable, pleasant, 
lignified, and sympathetic. They perceive 
‘heir own job as demanding more _inner- 
lirected traits than their Low counterparts 
lo. Specifically, they feel that forcefulness, 
magination, independence, _ self-confidence, 
ind decisiveness are especially pertinent, and 
n this aspect they differ (relatively speak- 
ng) from what the Lows feel toward their 
obs. The Highs feel that their jobs give 
‘hem more of a sense of fulfillment of needs, 
ind they also feel more satisfied in their 
obs, especially in their needs for security, 
wtonomy, and self-actualization. As to the 
mportance they attach to psychological 
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TABLE 1 
MEANS AND STANDARD DEVIATIONS FoR “H1cHs” 
AND “Lows” 
“Highs” “Lows” | Level of 
(V = 89) | (N = 89) | signifi- 
Item cance 
nan Game ar Ones 
M | SD | M | SD | tailed) 
Fulfillment 
Security 4.61 | 1.37 | 3.90 | 1.44} .001 
Social 5.05 | .89 | 4.81 | 1.08 ns 
Esteem 4.84] .90]4.37|1.00} .001 
Autonomy 4.98 | .84]4.46] .96] .001 
Self-Actualization | 4.97] .95 | 4.3511.17| .001 
Composite Score* | 4.93] .71| 4.42] .83| 001 
Pay 4.54 | 1.11 | 4.25 | 1.27 ns 
Information 4.67 | 1.26 | 4.07 | 1.51 | 01 
Satisfaction 
Security eon MOON 1:27 inde AalnO5 
Social 6,|| SO i | al ns 
Esteem Ole re Maer O ns 
Autonomy dl))| eh TOO Ail 
Self-Actualization | 1.04]. .86]1.34] .97] .05 
Composite Score | .74] .61| .97| 68] .O1 
Pay LELS a eit tele tes 2 ns 
Information 1.45 | 1.19] 1.97] 1.55] .05 
Importance 
Security 5.28 | 1.54 | 5.13 | 1.39 ns 
Social 5.19'| 1.00 | 4.91 | 1.12 ns 
Esteem Sead || MO | AM Tey || soy 
Autonomy SD) fV2|| SSI || 4 |) OR 
Self-Actualization | 6.21] .64]5.89] .73| .o1 
Composite Score | 5.64] .69/5.35! .68] .01 
Pay 5.71] .98|5.28]1.09} .01 
Information 6.19} .85| 6.06] .84 ns 
Inner-Directed¢ 5.32] .89/4.83| 1.04] .01 




















8 Composite Score—the average score for each individual of 
the 13 Maslow-type items. Computed separately for Fulfillment, 
Importance, and Satisfaction. 


> Satisfaction—in all items, the higher the score the higher 
the dissatisfaction. 

° Inner-Directed—the mean score for the ranking of five 
adjectives which compose the inner-directed traits; the higher 
the score the higher the ranking on importance. 

_¢ The tests for significance were performed on the full-score 
distributions of the two groups by the Mann-Whitney U test. 


needs, they perceive most of them to be more 
important than the Lows do, and especially 
so for the higher-order needs. 

Shifting attention to the Lows, they per- 
ceive themselves as being relatively more 
practical, deliberate, courageous, discreet, 
planful, and intelligent, compared to the 
Highs. They see themselves as relatively 
calm, steady, modest, civilized, and patient 
individuals. Their jobs, as perceived by them- 
selves, require more other-directed traits than 
the Highs think are required by their jobs. 
Cooperation, adaptability, caution, agreeabil- 
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ity, and tact are considered by the Lows to 
be more important for their jobs than they 
are by the Highs for their jobs. Their per- 
ceived fulfillment, and satisfaction of psycho- 
logical needs, and the importance they attach 
to these needs, are generally lower than those 
of the Highs. 


DISCUSSION 


The basic hypothesis of the present study 
was that variations in job attitudes at a 
specific managerial level will be related to 
differences in personality traits. Furthermore, 
it was hypothesized that the observed dif- 
ferences in attitudes between lower-middle 
managers who differ in their self-perceived 
personality traits would be similar in quality 
to differences in job attitudes among dif- 
ferent levels of managers. That such system- 
atic differences of attitudes within a single 
level of management can be found has been 
demonstrated by the results of this study. 
Thus, it can be concluded with some confi- 
dence that some of the variation in mana- 
gerial job attitudes can be accounted for by 
personality-trait variables. 

Decision making is one of the requirements 
in any managerial job. This requirement 
tends to vary with the level of management, 
the most difficult and far reaching decisions 
generally being made at the top echelon. But 
even at a comparatively low level, that of 
lower-middle management, some decisions 
very often have to be made. For some man- 
agers the discrepancy between their self- 
perception and role-perception will be small 
or minimal, whereas others may experience 
a significant discrepancy between these two 
percepts. 

Let us look at our Low group first. Objec- 
tively they fulfill the requirements for their 
positions as do most of the other managers 
of the same level: They are intelligent, edu- 
cated people and are at an age level similar 
to other managers in comparable posi- 
tions. In Adams’ (1963) terminology, their 
‘Gnputs,” at least in terms of certain demo- 
graphic variables, are similar to those of the 
other managers at their own level. However, 
they may sense a feeling of incongruity be- 
tween their self-perceived personality traits 
and the objective requirements of the mana- 
gerial role—they are required to behave pag 
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if’ they were determined, resourceful, am- 
bitious, and enterprising. This incongruity 
leads them to believe—possibly rightly so— 
that the environment does not reward them 
as much as it should. For example, they feel 
less sense of security than others do; they 
feel that not enough information is given 
to them. Their expectations are high in rela- 
tion to what the environment gives them, and 
consequently they sense a feeling of relatively 
high dissatisfaction. It is, of course, open 
to speculation whether the environment in 
reality does reward them less than it does 
their fellow-managers—it is conceivable that 
this is the real state of affairs. Even if this 
were not the case, for our purpose it is im- 
portant to understand the perception of the 
situation on behalf of those people who are 
likely to sense an incongruity between 
their own traits and those demanded by the 
environment. 

To summarize, the Lows, as well as other 
individuals working in groups, supposedly 
tend to evaluate their own ability in the 
light of the abilities of other co-workers 
(Festinger, 1954). They see themselves as 
fulfilling the “input” requirements of their 
positions, in terms of general ability. They 
may act in accordance with their under- 
standing of themselves, but it can be assumed 
that their jobs confront them with objective 
requirements that are incongruent with their 
self-perceptions. The environment—subordi- 
nates, peers, and superiors—react to the role 
enactment of the Lows in a predictable way: 
it apparently does not reward them in the 
same way as it does the other workers at the 
same level, and thus they feel a sense of low 
fulfillment of their needs. Their expectations 
as junior members of the managerial staff are 
still high, and thus a feeling of dissatisfaction 
is apparent. 

Turning now to the Highs, it is easy to see 
why these people are more satisfied in most 
needs—their abilities are compatible with the 
traits required by their jobs, both as these 
traits are perceived by them and as they are 
objectively necessary in the management 
positions. Although they sense a high level 
of fulfillment of their needs, their expecta- 
tions are relatively high. They are not 
“completely” satisfied, and their expectations 
are always a step ahead of what they get. 


PERSONALITY AND JOB ATTITUDES 


It is not known how these managers actually 
behave in their job situations, but there is 
evidence from a laboratory study that people 
tend to exhibit a significantly higher amount 
of verbal interaction when they score higher 
on the DMA scale than do the lower scorers 
on this scale. Also, the self-description scores 
on the DMA scale correlated with the peer 
rankings made by fellow group members 
(Porter & Kaufman, 1959). 

Managers who score high on the DMA 
scale feel that the environment bestows upon 
them some rewards which they expect to 
receive. Although they feel that they would 
like to get more information than they 
actually get, they are more satisfied in this 
respect than their Low counterparts. 

It is interesting to observe that both groups 
of managers are not concerned with social 
needs as measured by the MPQ: These needs 
do not seem important to them and they are 
fulfilled and also satisfied more than any 
other need. Moreover, in no case is there 
any difference in this respect between the 
Lows and the Highs. This finding is con- 
sistent with Porter’s (1961, 1962, 1963, 
1964) results. He did not find any significant 
difference among managerial levels as to their 
attitudes toward the social needs—neither in 
fulfillment nor in satisfaction nor in the im- 
portance of these needs. It can be safely 
concluded, then, that the social need (as 
measured by the MPQ) is of little relevance 
within the framework of the study of manage- 
rial job attitudes, at least in the United States. 

The major results of the present study 
can now be analyzed in the light of the series 
of studies by Porter, mentioned above. In the 
summary to this series (Porter, 1964), it was 
explicitly stressed that the level of manage- 
ment should be considered as the main inde- 
pendent variable affecting managerial job 
attitudes: 


The findings of the present study confirm the ex- 
pectation that the managerial-level variable would 
have a definite relationship to satisfaction attitudes. 
First of all, level has a very strong relationship 
to the amount of need fulfillment a manager gets 
in his job; secondly, level also relates consistently 
to the manager’s satisfaction with the amount of 
need fulfillment he sees himself obtaining. In other 
words, the level of the manager’s job is a key element 
in affecting his attitudes toward his work environ- 
ment [p. 18]. 
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Fic. 1. Dissatisfaction of needs, comparing “Highs” 
and “Lows” to Vice Presidents and Lower Managers. 
(Data for latter two groups are from Porter, 1964.) 


The data on the importance of needs were 
considered primarily to “be thought of as 
reflections of personality characteristics rather 
than as perceptions of the job environment 
[p. 22],” and were thus assumed to be 
more consistent across the levels of manage- 
ment than were data for fulfillment and 
satisfaction. 

In the present study managers at a rather 
narrow level of the management hierarchy, 
namely the lower-middle level, have been 
separated into two distinct groups: Those 
who score high on a self-descriptive scale 
and those who score low on this scale. The 
results show that the Highs of the present 
sample are /ess satisfied with Porter’s Vice 
Presidents, whereas the Lows are more satis- 
fied than the lower, first-line managers. (See 
Figure 1—A similar picture could also be 
drawn for the results for fulfillment.) 

It can thus be concluded with considerable 
confidence that neither of the two variables— 
job situation nor perceived personality traits 
—can explain by itself the variations in the 
perception of fulfillment and satisfaction of 
psychological needs. If only job level were 
responsible for differences in job attitudes, 
then no sizable systematic differences in atti- 
tudes should have been found when job level 
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Fic. 2. Importance of needs, comparing “Highs” 
and “Lows” to Vice Presidents and Lower Managers. 
(Data for latter two groups are from Porter, 1964.) 


was held more or less constant (or at least 
these differences should not have come out 
so consistently in the expected direction). If 
only perceived personality traits were related 
to attitudes without any influence of job 
level, then the magnitude of differences 
within one level should have been similar to 
the magnitude of differences in attitudes held 
by managers in different levels. 

This conclusion, however, has to be modi- 
fied with regard to the perception of the 
importance of needs. The predicted differ- 
ences have been found in the present study, 
but they are very similar to those found (by 
Porter) between high-level and low-level 
managers (Figure 2). Thus it must be said 
that the perception of the importance of 
needs is less influenced by environmental 
factors than the perception of the fulfillment 
and the satisfaction of the same needs. This 
conclusion is consistent with Porter’s hy- 
pothesis quoted above about the influence 
of personality relative to the influence of 
environment on the perception of the 
importance of psychological needs. 

In summary, viewing the present study in 
perspective, it appears that more than one 
factor must be held responsible for shaping 
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job attitudes in management. At this stage 
it is known that at least two factors are 
strongly related to attitudes—the environ- 
ment as indicated by the level of manage- 
ment, and personality as measured by self- 
perception of psychological traits. 
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WORK-GROUP VERSUS INDIVIDUAL DIFFERENCES 
IN ATTITUDE 


THOMAS H. JERDEE 


School of Business Administration, University of North Carolina 


The objective was to determine the relative magnitude of group and individual 
differences in job attitudes. Responses to a 20-item Likert-type attitude scale 
were obtained from 190 employees, sampled from 38 work groups in 3 manu- 
facturing plants. The hypothesis that the work groups did not differ in job 
attitudes was tested by an analysis of variance. The observed work-group 
differences in attitudes were not significant, and the lowest and highest work- 
group means in each of the 3 plants were not significantly far apart. In these 
3 plants, at least, the more appropriate unit for administrative action or 
for research study on employee attitudes seems to be the individual, not 


the work group. 


Although approximately 2,000 writings on 
job attitudes are covered in the reviews by 
Brayfield and Crockett (1955), Child (1941), 
Herzberg, Mausner, Peterson, and Capwell 
(1957), Scott, Dawis, England, and Lofquist 
(1958), and Yuzuk (1961), few if any 
studies provide information on the relative 
magnitude of work-group differences in atti- 
tudes, as opposed to individual differences. 
Both types of differences may be worthy of 
research interest and administrative concern, 
but the relative size of work-group differences 
in attitudes, as opposed to individual dif- 
ferences, may have implications for both the 
researcher and the administrator. If differ- 
ences among work groups tend to be large 
relative to differences among individual group 
members, the research study or administra- 
tive manipulation of factors associated with 
group attitudes might be more promising. On 
the other hand, if variability among groups 
is small relative to individual variability, then 
greater emphasis might be placed on factors 
associated with individual attitudes. 

Brayfield and Crockett (1955) and Child 
(1941) do consider research on group atti- 
tudes and individual attitudes separately, the 
former pointing out that “a relationship 
which exists at the individual level between 
satisfaction and productivity may be ob- 
scured when the individuals are lumped to- 
gether [p. 415].” But they provide no in- 
formation on the relative magnitude of group 
and individual variability. In a study of air- 
craft workers, Bernberg (1952) concludes 
that morale is related to performance when 
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groups are compared, but not when indi- 
viduals are compared. However, Bernberg’s 
“sroups” actually were five large depart- 
ments containing a total of about 1,000 
workers, and he compared all five depart- 
ments even though they were located in three 
different plants. Furthermore, he treated his 
four measures of morale and one measure 
of performance as if they were different 
levels of a single independent variable in a 
two-way analysis of variance, and the re- 
sulting variance estimates and F ratios are 
meaningless. 

Other researchers apparently have gone 
ahead with investigations of the correlates 
of employee attitudes without pausing to 
examine whether variability in attitudes is 
primarily a group or an individual phenome- 
non. This question of the relative magnitude 
of group and individual differences in job 
attitudes was studied in the present research. 


MrETHOD 


Data on job attitudes were obtained for 190 
employees sampled randomly, 5 per group, from 38 
work groups in 3 manufacturing plants. The em- 
ployees, most of whom were skilled or semiskilled 
machine operators, averaged 48 years of age and 17 
years of service with their employer. In some work 
groups the individual members were highly inter- 
dependent, and conditions usually said to give rise 
to group cohesiveness were present. In other work 


groups, the individual members worked at tasks 
requiring little or no interaction among group 
members. 


The measure of attitudes was a 5-step Likert- 
type scale containing 20 items from the Triple 
Audit Employee Attitude Scale developed ‘by the 
Industrial Relations Center of the University of 
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TABLE 1 


Work-Group MEAN ATTITUDE SCORES 


Plant B 








Plant A Plant C 
54 57 57 
53 55 54 
50 53 52 
50 52 50 
50 52 50 
50 51 49 
46 50 49 
44. 49 48 
43 49 47 
42 49 47 
41 47 44 
40 46 40 

45 38 





Note.—Group n’s =5; pooled within-group SD = 9.64. 


Minnesota. Total score on these 20 items correlates 
approximately .96 with total score on the complete 
54-item scale, which has an odd-even reliability of 
.93 (Yoder, Heneman, & Fox, 1954). The attitude 
scale was administered to the employees in small 
groups (minimum size of 5), under conditions of 
assured individual anonymity. The maximum pos- 
sible individual score on the attitude scale was 80. 
Observed scores ranged from 24 through 70, with 
a mean of 48 and a within-groups standard devia- 
tion (pooled from the 38 groups) of 9.64. 

The hypothesis that the work groups did not 
differ in attitude was tested by an analysis of vari- 
ance. Rejection of this hypothesis would indicate 
that work-group differences in attitude were larger 
than would be expected on the basis of individual 
variability in attitude. 


RESULTS 


The work-group means are shown in 
Table 1, arranged from highest through 
lowest mean in each plant. From a practical 
viewpoint, the differences of 14, 12, and 19 
points between high and low means in the 
three plants might appear large. Manage- 
ments in these three plants might be es- 
pecially concerned about improving the atti- 
tudes of the extremely low groups, and since 
the group averages are so low, their first 
inclination might be to look for the cause 
in group-related factors, such as type of 
work, work environment, supervisory behav- 
ior, and social relations within the group. 

However, a group-oriented attack on the 
problem of the low-attitude groups could end 
in frustration. As the analysis of variance 
in Table 2 shows, the observed work-group 
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differences in attitude are not significant, 
that is, they are no larger than would be 
expected on the basis of individual variability 
in attitudes. With 35 and 152 degrees of 
freedom, this analysis would be sensitive to 
even a slight work-group effect. Furthermore, 
the lowest and highest means in each plant 
are not significantly far apart at the 5% 
level (Dixon & Massey, 1951, pp. 146-147). 
Thus the supervisors of the low-attitude 
groups might be innocent victims of the luck 
of the draw, saddled with several individual 
attitude problems rather than one group at- 
titude problem. In these three plants, the 
more appropriate unit for administrative 
action or for research study on employee 
attitudes seems to be the individual, not the 
work group. 


DiIscUSSION 


These conclusions obviously cannot be 
generalized to all manufacturing plants or to 
all types of work groups. In other circum- 
stances the differences between groups might 
be much larger. Furthermore, the composi- 
tion of the attitude scale would affect results; 
a carefully selected list of items might reveal 
larger group differences in attitudes. How- 
ever, most of the items used in this study 
did pertain to presumably group-related 
factors such as supervision and the work 
itself, rather than to strictly individual or 
company-wide factors. 

The question raised here seems to be of 
fundamental importance. A basic task of the 
behavioral scientist in industry is to deter- 
mine the causes of variability in employee 
behavior. His quest for causal relations is 
more likely to be fruitful if he has first 
determined whether the behavior he is study- 
ing is a group phenomenon or an individual 


TABLE 2 
ANALYSIS OF VARIANCE OF ATTITUDE SCORES 


Source of 
variation df MS F F 95 
Between plants 2 219.9 2.3% 3.06 
Within plants 
Work groups 35 104.1 1.12 1.50 
Error 152 92.9 





GROUP AND INDIVIDUAL DIFFERENCES IN ATTITUDE 


phenomenon. In research on job attitudes, the 
more promising unit of study appears to be 
the individual employee rather than the work 
group as a whole. 


REFERENCES 


BERNBERG, R. E. Socio-psychological factors in in- 
dustrial morale: I. The prediction of specific 
indicators. Journal of Social Psychology, 1952, 
36, 73-82. 

BrayFietp, A. H., & Crockett, W. H. Employee 
attitudes and employee performance. Psychological 
Bulletin, 1955, 52, 396-424. 

Cup, I. L. Morale: A_ bibliographical review. 
Psychological Bulletin, 1941, 38, 393-420. 

Drxon, W. J., & Massey, F. J., Jr. Introduction to 
statistical analysis, New York: McGraw-Hill, 1951. 


433 


HERZBERG, F., Mausner, B., PETERSON, R. 0. & 
Capwett, D. F. Job attitudes: Review of re- 
search and opinion. Pittsburgh: Psychological 
Services of Pittsburgh, 1957. 

Scorr, T. B., Dawis, R. V., Encranp, G. W., & 
Lorguist, L. H. A definition of work adjustment. 
Bulletin No. 30. Minneapolis: University of Min- 
nesota, Industrial Relations Center, 1958. 

Yover, D., Heneman, H. G., Jr., & Fox, H. Audit- 
ing your manpower management. Bulletin No. 13. 
Minneapolis: University of Minnesota, Industrial 
Relations Center, 1954. 

Yuzux, R. P. The assessment of employee morale. 
Bureau of Business Research Monograph No. 99. 
Columbus: Ohio State University Press, 1961. 


(Received September 20, 1965) 


Journal of Applied Psychology 
1966, Vol. 50, No. 5, 434-436 


FAILURE TO IMPROVE READABILITY WITH 
A VERTICAL TYPOGRAPHY 


E. B. COLEMAN 
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AND 


S. C. HAHN 


New Mexico State University 


3 experiments found conventional horizontal typography to be superior to 
vertical. 1 experiment presented the stimulus tachistoscopically in a procedure 
quite similar to the procedure used in an earlier experiment that found 
vertical typography to be superior to conventional even with unpracticed Ss. 
2 of the experiments used Ss who had been given practice reading 8,000 words 


printed in vertical typography. 


Coleman and Kim (1961) reported a number 
of studies that investigated four experimental 
typographies. Their most promising style was 
a vertical arrangement. This arrangement 
may enable the reader to better utilize the 
vertical part of his eye span. Although they 
found, as had Tinker (1955a), that it was 
read slower than conventional typography 
when long passages were being read, even 
unpracticed subjects (Ss) were able to read 
short sentences presented tachistoscopically 
in vertical typography more accurately than 
the same sentences presented in the conven- 
tional horizontal style. The arguments in 
favor of a vertical typography appear quite 
plausible. Because they appear plausible and 
because Coleman and Kim obtained positive 
results even with unpracticed Ss, it seems 
worthwhile to report very briefly the fol- 
lowing three studies that failed to find any 
advantage for the vertical typography. 


In 
the 
vertical 
typography 
the 
fixations 
overlap 
and 
maximally 
exploit 
peripheral 
vision. 


EXPERIMENT I 


Coleman and Kim (1961) blamed reading 
habits for the relative inefficiency of vertical 
typography when their Ss were reading long 


passages. Since Ss were college students, they 
had been reading material in a horizontal 
typography for many years. According to the 
explanation of Coleman and Kim, S had 
learned to suppress the cues from the vertical 
span of his vision. Perhaps beginning readers 
would be less subject to such habits of sup- 
pression. In Experiment I, children just 
beginning to read were used as Ss. 


Method 


Experimental Design. Vertical and conventional 
typography were compared in a Lindquist Type V 
design (Lindquist, 1953, p. 288). This design uses 
several Greco-Latin squares in a way that allows Ss, 
passages, and order of presentation to be counter- 
balanced. The essential point of the design is that 
each child read both typographies, and each passage 
was printed in both typographies. Each child read 
five passages in a vertical typography and he read 
another five in the conventional horizontal typogra- 
phy. Half the children read vertical typography 
first and half read conventional first. 

Subjects. The Ss were 16 second and third 
graders. 

Reading Materials. The materials were 10 short 
passages from Book A of the Standard Test Lessons 
in Reading (McCall & Crabbs, 1950). The passages 
were short narratives that ranged in length from 
130 to 140 words. Immediately following the nar- 
rative section were 8-10 multiple-choice questions. 
If he liked, the child could refer back to the 
narrative as he answered the questions. Each pas- 
sage with its questions was typed on a single sheet 
of 83 X 13-inch paper. Each of the 10 passages was 
typed in both typographies. Thus, one child read 
half of his passages in one typography and another 
child read those same 5 passages in the other typog- 
raphy. Each child read all 10 passages, 5 in the 
vertical typography and 5 in the horizontal. 

Presentation. The child first read instructions and 
completed a practice passage, both typed in the 
vertical typography. Then he was given 3 minutes, 
timed by a stop watch, to read each passage and 
answer its questions. 
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READABILITY OF VERTICAL TYPOGRAPHY 


Measure. The measure was mean number of 
questions answered correctly per passage. 


Results 


Conventional horizontal typography was 
read more efficiently by beginning readers. 
A mean of 6.8125 questions per passage was 
answered for conventional typography and 
5.9625 for vertical. This difference is signifi- 
cant beyond the .02 level by a two-tailed 
Wilcoxon matched-pairs T test (T= 21.5, 
WN — 16). 


EXPERIMENT IT 


Tinker’s Speed of Reading Test (1955b) 
has been widely used in studies of legibility, 
and it appears to be quite sensitive. Experi- 
ment II used Form I of this test in a Greco- 
Latin square design identical to that of 
Experiment I. Before the experiment, Ss 
were given approximately 2 hours practice in 
reading vertical typography. In that 2 hours, 
S read 10 short selections (a total of 8,000 
words) typed in the vertical typography. 
After reading each selection he filled in a 
multiple-choice test on it. 


Method 


Subjects. The Ss were 52 students from New Mex- 
ico State University who had been given 2 hours 
practice reading the vertical typography. 

Reading Materials. The reading materials were 
Form I of Tinker’s Speed of Reading Test (1955b). 
It consists of 450 short paragraphs. In the latter 
part of each paragraph one word spoils the meaning. 
The S must find this word and cross it out. An 
example is: “Ned straightened the whole house 
Saturday morning because he wished to surprise 
his mother. When Mrs. Winslow returned from her 
shopping trip she found the car all clean and 
shining.” The form was divided in half, and each 
half was typed in both typographies. Each S read 
one half in the vertical typography and one half in 
the horizontal. Half the Ss read vertical first and 
half read horizontal first. 

Presentation. The Ss were given the usual instruc- 
tions and practice sessions for the Speed of Reading 
Test (Tinker, 1955b). Then they were given 17 
minutes to complete each half of their test, the half 
in the vertical typography and the half in horizontal. 

Measure. The measure was the total number of 
wrong words crossed out by S. 


Results 


Conventional typography was read signifi- 
cantly faster than vertical. For conventional, 
the mean number of wrong words crossed out 
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was 144.73; for the vertical it was 129.65. 
This difference is significant beyond the .001 
level by a binomial test; 40 Ss favored 
conventional typography and 11 favored 
vertical. 


EXPERIMENT III 


Woodworth (1938) contended that reading 
speed would be only slightly improved by 
any arrangement that improved only visual 
efficiency. He contended that reading speed 
was essentially determined by mental proc- 
esses. He contended that visual efficiency in 
reading is so much greater than mental ef- 
ficiency that the eyes are usually marking 
time waiting for the mind to catch up. Ex- 
periment III used a measure that is relatively 
free of the time required for mental process- 
ing. Sentences were projected for brief 
exposure times. Then S was allowed 1 second 
to mentally process what he had seen before 
the sentence was presented again. The 
measure was the total number of tachisto- 
scopic exposures necessary for S to repeat 
the sentence perfectly. The measure there- 
fore reflected visual processing time more 
than it reflected mental processing time. 
Three different exposure times were used: 
.02 second, .10 second, and .50 second. 

It should also be noted that Experiment 
IIT used a tachistoscopic presentation quite 
similar to the presentation Coleman and 
Kim (1961) used when they found vertical 
typography to be superior to conventional. 


Method 


Experimental Design. Greco-Latin squares were 
used to give a design quite similar to those of 
Experiment I and Experiment II. The essential point 
is that each S read half his sentences in vertical 
typography and half in conventional typography. 
Each sentence was presented in vertical typography 
and also in conventional. Each S read one third of 
his sentences at a .02-second exposure, one third at 
.10-second exposure, and one third at a .50-second 
exposure. Order of presentation was counterbalanced 
for typography and for exposure time. Each sen- 
tence was presented in all three exposure times. 

Subjects. The Ss were 28 students from New 
Mexico State University who had been given 2 hours 
practice in reading the vertical style in the same 
fashion as those in Experiment II. ,; 

Reading Materials. The materials were 48 ten- 
word sentences. Two slides were prepared for each 
sentence, one in vertical typography and one in 


MEAN NUMBER OF EXPOSURES 





EXPOSURE TIME IN SECONDS 


Fic. 1. Mean number of exposures needed to 
memorize a sentence plotted as a function of ex- 
posure time. (Style of typography is the parameter.) 


conventional. On the vertical slide, the sentence was 
broken into two vertical columns, and on the 
horizontal it was broken into two lines. 

Presentation. The 48 slides for an S were presented 
with a Lafayette tachistoscope. Each S saw 24 sen- 
tences in one of the typographies, 8 presented in an 
exposure time of .02 second, 8 at a time of .10 
second, and 8 at a time of .50 second. Then he saw 
24 sentences in the other typography at the same 
exposure times. A sentence was projected at a par- 
ticular exposure time, S was allowed 1 second to 
mentally process what he had seen, and then the 
sentence was presented again at that exposure time. 
It was reexposed until S could repeat it perfectly. 
Then another sentence was exposed. 

Measure. The measure was the total number of 
exposures required to repeat the sentence perfectly. 


Results 


The mean number of exposures S needed 
to repeat the sentence at each time interval 
is shown in Figure 1. The difference between 
horizontal and vertical is significant beyond 
the .05 level by a binomial test; 20 Ss 
favored horizontal and 8 Ss favored vertical. 
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DISCUSSION 


Coleman and Kim’s (1961) finding that 
vertical typography might be superior to con- 
ventional was not replicated in three studies 
that used several samples of Ss, several 
samples of reading material, several methods 
of presentation, and several measures of 
reading efficiency. The conventional hori- 
zontal typography was significantly superior 
to vertical in all three experiments, including 
one that used a tachistoscopic presentation 
similar to the one used by Coleman and Kim 
when they found vertical to be superior. Two 
of the experiments used Ss who had been 
given a slight amount (2 hours) of practice 
in reading a vertical typography. The present 
negative results do not prove that vertical 
typography would be inferior if readers were 
given much practice in that typography; 
however, the present experiments, particu- 
larly the tachistoscopic experiment that used 
a measure that mainly indexed visual proc- 
essing time, do suggest that such practice 
would probably be long and costly. It would 
require the preparation of much material in 
a vertical typography. 
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IMPORTANCE OF WORK VERSUS NONWORK AMONG 


SOCIALLY AND OCCUPATIONALLY 
STRATIFIED GROUPS 


FRANK FRIEDLANDER 


Division of Organizational Sciences, Case Institute of Technology 


The importance of work-related versus nonwork-related factors as opportunities 
for satisfaction was compared among low-, medium-, and high-status groups, 
and between white-collar and blue-collar occupational groups by analysis of 
questionnaire responses from 1,468 Civil Service resident employees of a Govern- 
ment community. The value hierarchy, in terms of increasing importance, was 
recreation, education, church, work-context, and work-content factors. Sig- 
nificant differences were found between the value systems of white-collar and 
blue-collar groups. However, no significant differences were found between low-, 
medium-, and high-status groups unless the occupational group of the employee 
was simultaneously considered. Then, differences between white-collar and blue- 
collar values were marked in the high-status level. Results are discussed in 
terms of the opportunities that various environmental stimuli present to con- 
trasting occupational and status groups for effective and competent interaction 
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with their environment. 


Numerous studies have dealt with the po- 
tential stimulants in the individual’s work 
environment, and have frequently compared 
types of job characteristics (ie., intrinsic 
versus extrinsic) in terms of their relative im- 
portance to the employee (Dunnette, 1965; 
Friedlander, 1964; Herzberg, Mausner, & 
Snyderman, 1959; Porter, 1963). Few studies, 
however, have considered the relative impor- 
tance of the elements in the work situation as 
compared with those in the nonwork situa- 
tion as sources of satisfaction. If the world of 
work plays a comparatively unimportant part 
in contributing to the worker’s satisfaction 
when compared with the nonwork world, then 
the myriad of studies that have been made 
which compare various job characteristics 
with each other are creating too much fuss 
and bother about a small segment of the po- 
tentially satisfying interactions available to 
the individual in his ¢ofal environment. An- 
other limitation to most of the previous re- 
search on this subject, with the exception of 
Porter’s study, is the lack of differentiation 
among occupationally and/or socially strati- 


fied groups. For example, workers from dif- 
ferent groupings probably would place dif- 
ferent values on various facets of both their 
work and nonwork environments. 

The purpose of this study, then, was (a) 
to compare work-related and nonwork-related 
factors as opportunities for satisfying inter- 
action in the employee’s surroundings, and (0) 
to account for any differences in the im- 
portance of environmental stimuli through a 
knowledge of the socially and the occupa- 
tionally stratified groups to which the worker 
belongs. 

The perspectives within which comparative 
value systems are analyzed in this study are 
(a) high-, medium-, and low-status groups, 
and (0) white-collar and blue-collar occupa- 
tional groups. The relative values that these 
memberships hold toward each of the fol- 
lowing five environmental factors were ex- 
amined: education, church, recreation, work 
content (intrinsic and intrapersonal aspects), 
and work context (extrinsic and interpersonal 
aspects). Education and church were _in- 
cluded as two institutions of implicit value in 
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our culture; recreation was selected as a 
factor of increasing importance in our society 
and as a contrast to work; the two aspects 
of work were chosen to test the breadth of 
applicability across various classes of member- 
ship groups of current work-motivation theory. 


METHOD 
Subjects and Setting 


This study was conducted in an isolated com- 
munity of about 12,000 people, of which the 3,200 
civilian wage earners all work directly for a branch 
of the United States Government. Such a com- 
munity, in which the spheres of work and of social 
community relations are an almost indistinguishable 
blend, offers a unique opportunity to analyze the 
values of those who live and work in the com- 
munity toward a wide range of experiences that may 
be of importance to them. 

The values of each employee were tapped by 
means of an anonymous questionnaire which was 
sent to the homes of all 3,200 primary wage earners 
living in the community. Response to the survey was 
approximately 46%. Control data indicated minor 
distortions in returns in the direction of greater 
participation from those in higher-status positions. 
Since the type of analysis utilized took into account 
status differences, potential bias from this source 
was minimized. 

Each of the 1,468 respondents was categorized into 
one of two occupational groups and one of three 
status groups. Status was defined simply as the rela- 
tive level to which the respondent had advanced 
within his occupational level. In the Civil Service, 
a worker’s natural promotional progression is upward 
in the GS grade level within the white-collar oc- 
cupations, and upward from apprentice, through 
journeyman, to supervisory status in the blue-collar 
occupations. In this study, status for each occupa- 
tional level is trichotomized into high, medium, and 
low. Occupational group refers to the socioeconomic 
groupings in which the “style of life” for such a 
membership is similar, and in which mobility from 
one occupational level to another is difficult and un- 
likely. In this study, the two occupational levels are 
referred to as blue collar (apprentice, journeyman, 
supervisor), and white collar (graded GS personnel). 
For a further description of the categorization 
scheme, the sample, and the setting, the reader is 
referred to an earlier study (Friedlander, 1965). 


Environmental Factors 


Of a larger, two-part questionnaire, only the por- 
tion dealing with the relative importance of a variety 
of environmental factors was used in this study. 
Directions for answering the questions in this sec- 
tion were: 


Now that you have indicated how satisfied or dis- 
satisfied you are with various things, we would 
like some indication of how important each of 
these same things is to your feeling of satisfaction 
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or dissatisfaction: (1) of extreme importance to 
me, (2) very important to me, (3) of moderate 
importance to me, (4) of little importance to me, 
(5) of no importance to me. 


The five environmental factors of concern were 
each composed of from two to eight items. The 
environmental factor of potential importance, its 
internal consistency (K-R 20) reliability coefficient, 
and the item content were as follows: education— 
quality of instruction, facilities, and space, and 
reputation of grade schools, high schools, and col- 
lege in the area (.91); church—adequacy of church 
facilities and availability of church of choice in the 
area (.93); recreation—adequacy of recreational fa- 
cilities, parks and playgrounds, and community meet- 
ing places (.82); work content (intrinsic)—perform- 
ing challenging assignments, using of best abilities, 
feeling of achievement, amount of responsibility, and 
amount of freedom on the job (.83); work context 
(extrinsic)—feeling of job security, working rela- 
tionship with supervisor, technical competence of 
supervisor, smooth and efficient work group (.80). 


RESULTS 


The issues posed earlier called for a 2 (oc- 
cupational levels) X 3 (status levels) X 5 (en- 
vironmental factors) repeated-measurements 
analysis-of-variance design with each subject 
nested within one of the six membership cate- 
gories. Since the size of N in each of these 
six cells differed, an unweighted-means solu- 
tion was derived. The Newman-Keuls method 
was used for all tests of simple effects. The en- 
tire analysis followed closely the model out- 
lined by Winer (1962). 

Table 1 indicates that the differences be- 
tween the value systems of white-collar versus 
blue-collar occupational levels, as well as the 
Occupational Groups X Environmental Fac- 
tors (OG X EF) interaction, are highly sig- 
nificant. In addition, the Occupational x 
Status Groups (OG X SG) interaction reached 
the .05 level of significance. Contrary to ex- 
pectations, however, differences in value sys- 
tems held by low- versus medium- versus 
high-status levels failed to reach significance, 
as did the Status Groups X Environmental 
Factors (SG X EF) interactions. These find- 
ings are discussed in more detail in the fol- 
lowing paragraphs. 

It is obvious from Table 1 and Figure 1 
that blue-collar workers place a greater value 
upon most of the five environmental factors 
than do the white-collar workers, although the 
significant OG X EF interaction indicates that 
this difference in values is not consistent across 
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TABLE 1 


ANALYSIS OF VARIANCE OF VALUES OF ENVIRONMENTAL 
FACTORS AS A I’ UNCTION OF OCCUPATIONAL GROUP 
AND STATUS GROUP 




















Source df | MS FF 
Between subjects 
Occupational group (OG) 1 3332.0} 30.68*** 
Status group (SG) Z| L190)" 1.10 
OG X SG 2| 357.0] 3.29% 
Subjects within groups 1462 | 108.6 
ia 
Within subjects wes 
Environmental factors (EF) 4 |5890.8 | 167.30*** 
OG X EF 4} 357.0) 10.14*** 
SG X EF 8| 59.5] 1.69 
OG X SG X EF 8} 104.1] 2.95** 
EF X Subjects within groups | 5848 | 35.2 
*p <.05. 
we D < 01. 
kD < .001. 


all environmental factors. White-collar and 
blue-collar groups differ most in the impor- 
tance they attach to church-related matters, 
somewhat in the importance of education, 
work context, and recreation, and least (not 
significant) in work-content values. 

A more meaningful contrast may be made 
by comparing the value differences between 
blue- and white-collar workers for each en- 
vironmental factor as a function of the mean 
difference of 1.92 between these two groups. 
This comparison compensates for the higher 
norms of the blue-collar group. Thus, relative 
to the mean difference in values between these 
two groups, blue-collar personnel place far 
greater value on church-related matters than 
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Fic. 1. The relative values of five environmental 
factors held by white-collar and blue-collar em- 
ployees. 
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Fic. 2. The relative value of recreation held by 
white-collar and blue-collar employees at three dif- 
ferent status levels. 


do white-collar workers, while white-collar 
personnel attach primary importance to the 
personal and content value of their work. 
Furthermore, an analysis of merely the two 
work factors reveals a clear pattern: blue- 
collar personnel value the work context more 
than the work content, while white-collar 
workers value the work content more than the 
work context. 

Examination of the value hierarchy within 
the blue-collar group indicates no significant 
difference in the importance of recreation and 
education, but these are valued less than 
church-related matters which, in turn, are less 
important than work-content factors; work- 
context characteristics are of paramount im- 
portance to the blue-collar group. The value 
hierarchy within the white-collar group gives 
equal importance to recreation, education, and 
church, with significantly greater value at- 
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Fic. 3. The relative value of education held. 
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tached to work-context characteristics, and 
still greater importance to work-content fac- 
tors. 

That there is no significant difference in 
the value systems of low- versus medium- ver- 
sus high-status levels is clearly indicated in 
Table 1 by both the nonsignificant main ef- 
fect of SG and the nonsignificant SG x EF 
interaction. Thus, one cannot differentiate 
among these three levels on the basis of their 
value systems (of the five environmental 
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factors tapped in this study); what is of im- 
portance to one of these levels is of similar 
importance to the other two levels. 

The significant three-way interaction sug- 
gests that the findings cited in the above 
paragraph are not consistent across occupa- 
tional groups. The status group to which an 
employee belongs has an influence on _ his 
value hierarchy only when his occupational 
group is also taken into account. As illustrated 
in Figures 2 through 6, extreme differences 
between high-status white-collar and high- 
status blue-collar personnel (and greater simi- 
larity between low-status white-collar and 
blue-collar workers) is apparent in values to- 
ward the church and work-context charac- 
teristics. These differences and similarities are 
less exaggerated in the value toward educa- 
tion and recreation. Finally, Figure 6 shows 
that the value of work-content factors is al- 
most identical for high-status white-collar and 
blue-collar workers and for low-status white- 
collar and blue-collar personnel, but a marked 
difference in value toward this factor is found 
only between the medium-status white- and 
blue-collar membership. 


Discussion 


Strauss (1963) recently questioned two 
assumptions which underlie much of the re- 
search concerning job motivation. The first 
of these questions involves the assumption of 
the universality of the desire to achieve self- 
actualization, while the second pertains to the 
importance of the job, as opposed to the com- 
munity (nonjob), as a source of satisfaction. 
Both assumptions, according to Strauss, bear 
all of the earmarks of their academic origin. 
Data from this study suggest that the as- 
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sumption of the universality of the impor- 
tance of self-actualization is suspect. The 
work-context factor, composed of items that 
satiate deficit needs (Maslow, 1955), was of 
primary importance to all status levels within 
the blue-collar group plus the low-status 
white-collar group. Only the medium- and the 
high-status white-collar workers placed _pri- 
mary importance on the work-content factor, 
composed of job characteristics which might 
potentially fulfill growth needs. 

The assumption concerning the importance 
of work as opposed to nonwork as a source of 
satisfaction, although questioned by Strauss, 
seems to be a plausible one. Work and the 
work environment do provide greater op- 
portunity for satisfying interactions than do 
nonwork factors. For example, the value to all 
groups of their combined work content and 
work context as an opportunity to obtain 
satisfying stimulation exceeded to a significant 
extent that of church-related and that of edu- 
cational factors. The latter are often assumed 
to be of implicit value in our culture, al- 
though one might claim similar importance 
for work and its moral justification (Weber, 
1930). Similarly, recreational activities were 
perceived as having the least value for po- 
tentially satisfying stimulation, particularly 
when compared with work-related activities. 
In a society where there is increasing pres- 
sure to remove the worker from his workplace 
through extended vacations, shorter working 
hours, and earlier retirement, one may well 
question the extent to which recreation will 
prove an even partial substitute for work as an 
opportunity for meaningful environmental in- 
teraction. 

In a study which does not seem to have 
fazed industrial sociologists or psychologists 
to any great extent, Dubin (1956) found that 
for a vast majority, work and the workplace 
are not central life interests. Similar results 
were obtained more recently by Whitehill 
(1964). Quite obviously, such findings con- 
trast with those derived from the present 
study. It is possible that the environment in 
which this study was conducted attracts in- 
dividuals who seek predominant interaction 
with their work rather than the other en- 
vironmental stimuli with which this study is 
concerned. Replication in other work-com- 
munity environments is therefore suggested. 


441 


This unsettled question of the centrality of 
work would seem to demand greater explora- 
tion prior to, or at least concurrent with, 
further studies on the value of the more de- 
tailed characteristics of work. If work and the 
workplace, as a total, represent merely a small 
proportion of potential environmental stimuli 
for the individual it would seem less fruitful 
to explore the proportional characteristics of 
this small proportion. 

Contrary to the findings of Dubin (1956) 
and Whitehill (1964), the results of this study 
suggest that the world of work embodies no 
small proportion of life values, but instead 
represents a major environmental variable 
with which the individual can interact so as 
to obtain satisfying stimulation. Work and 
the workplace hold for the individual an op- 
portunity to promote an effective and com- 
petent interaction with his environment. 
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APPLICATION OF EMPIRICAL METHODS TO 
COMPUTER-BASED SYSTEM DESIGN* 


GLORIA LAUER GRACE 


System Development Corporation, Santa Monica 


This study provides information about the clarity and usefulness of printout 
formats designed for use by military nonprogrammer personnel. 3 printout 
formats containing the same information were designed. Verbal printout format 
presented information in words; Data Block printout format, in sets of data; 
Eidoform printout format, in maplike form. 23 men stationed at Phoenix Air 
Defense Sector served as Ss. Immediately following the experimental sessions, at- 
titude information was collected in individual interviews. Printout formats and 
sets of interpretation questions were combined for analysis using a Latin-square 
design. Analysis of variance showed experimental treatment conditions, printout 
formats, and practice effect to be statistically significant. Differences due to 
sequence and test forms were not significant. Attitude results supported in- 


formation measure findings. 


User-oriented design has become an in- 
creasingly important feature for modern com- 
puter-based systems. A special application, the 
ease of use of computer outputs by military 
personnel, was selected for empirical study. 
Results from preliminary field investigations 
indicated that certain computer-printout for- 
mats were preferred over others. Some for- 
mats tended to be interpreted more accurately 
and easily than others. Why did this occur? 
Work by Klare, Mabry, and Gustafson (1954, 
1955a, 1955b) pointed up the importance of 
style difficulty, patterning, and human in- 
terest variables in studying the use of printed 
material by military subjects (Ss). In addi- 
tion, techniques for measuring readability, 
such as those developed by Flesch (1946, 
1949, 1958), Dale and Chall (1948), and 
Gunning (1952), along with the work of Pat- 
erson and Tinker (1940), Klare (1963), and 
others, suggested ways in which printout-de- 
sign effectiveness might be measured. 

Empirical methods were applied to the task 
of obtaining meaningful answers to design 
questions about printout formats. In the past, 
printout design resulted from a blend of de- 
signer experience and chance. Empirical evi- 
dence about printout-format effectiveness was 
not available. The present study attempted to 
formulate a method by which differences in 


1 A paper describing this study was presented at the 
Western Psychological Association, Honolulu, June 
1965. 


printout-format-design effectiveness could be 
measured. 

One purpose of this study was to provide 
information about the clarity and usefulness 
of different printout formats in order to de- 
termine which of three formats could be most 
effectively used by military personnel. An- 
other purpose was to develop and evaluate a 
method for the empirical assessment of sys- 
tem design features. Although applied only 
to printout formats in this study, generally 
applicable techniques for constructing stimu- 
lus prototypes and obtaining measures of de- 
sign effectiveness were developed. The feasi- 
bility of achieving sufficient sample size and 
satisfactory experimental conditions at a field 
site was determined. The statistical adequacy 
of the experimental design was assessed. 


EXPERIMENTAL DESIGN CONSIDERATIONS 


Four major experimental variables and their 
interactions had to be taken into consideration 
in designing this study. (a) Individual Dif- 
ferences. The Ss differ in experience and abil- 
ity to interpret printouts. The study was to be 
conducted in a field-site setting. Since matched 
groups were virtually impossible to obtain in 
this setting, each S must serve as his own con- 
trol. Experimental design must control for in- 
dividual differences and for practice effect. In 
this way, maximum data for the limited num- 
ber of Ss available at a field site could be col- 
lected. (6) Printout Formats. Three different 


442 


CoMPUTER-BASED SYSTEM DESIGN 


printout formats were studied: Verbal, Data 
Block, and Eidoform. Experimental design 
must permit assessment of effectiveness of 
these three printout formats. (c) Study Ques- 
tions. An information measure of printout 
effectiveness was provided by Ss’ answers to 
three equated sets of study questions. Experi- 
mental design must permit the extraction of 
the effect of study-question sets on perform- 
ance. (d) Sequence of Presentation. The 
order of presentation of printout formats or 
sets of study questions may affect Ss’ ability 
to interpret printouts. To control experimen- 
tally for this contingency, printouts and study 
questions should be presented in all possible 
orders. 

As in any applied field of research, diffi- 
culties associated with obtaining sufficient Ss 
to arrive at statistically meaningful results 
arose. A Latin-square design was deemed most 
appropriate for obtaining meaningful results 
from the limited number of available Ss. Two 
experimental variables were involved, each 
containing three conditions. For the printout- 
format variable, conditions were Verbal, Data 
Block, and Eidoform format. For the study- 
question variable, conditions were three equiv- 
alent sets, Form A, Form B, and Form C. A 
total of 36 Ss would have been required to ac- 
count for all possible combinations of print- 
outs and sets of study questions, but sequence 
requirements for balanced design could be 
met with half of this number. Obtaining more 
than 20 Ss at a single field site was difficult, if 
not impossible. Obtaining data from more 
than one field site unduly complicated data- 
collection logistics and design. The decision 
was made to collect complete data from a 
minimum of 18 Ss at a single field site. 


SUBJECTS 


The Ss were 23 men stationed at Phoenix 
Air Defense Sector. Of this number, 22 were 
military personnel. The remaining S was a 
System Development Corporation field repre- 
sentative. An information sheet filled out at 
the beginning of the experimental session pro- 
vided the following data. Age range for the 
sample was 20-40 years with a median age 
of 28 years. The 22 military Ss had spent from 
1 to 18 years in active military duty with a 
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median service length of 9 years. All Ss had 
completed high school and almost 40% had 
some college experience. With regard to com- 
puter familiarity, 22% had experience operat- 
ing a computer, 9% had some programming 
experience, and 39% had some experience 
reading computer printouts. 


DESCRIPTION OF PRINTOUTS 


To facilitate the production of printout- 
format prototypes, the decision was made first 
to select a simple printout already in existence. 
This printout would become one of the ex- 
perimental formats. Two variations containing 
identical information would then be con- 
structed, using the existing printout as a guide 
for production. A printout describing a set of 
simulated profile flight paths produced by 
the Site Production and Reduction System 
(SPARS) was selected for this purpose. 
SPARS, a computer-based system used for 
system training purposes in an air defense en- 
vironment, has been described earlier by Grace 
and Newlands (1964). 

Three forms of printouts, each describing 
the same 20 profile flights stored on the 
SPARS Master tape, were produced on the 
line printer. These printouts contained ex- 
actly the same information and differed only 
in format. These printout formats were desig- 
nated Verbal, Data Block, and Eidoform. 

In Verbal printout format, information was 
presented using words, phrases, or sentences. 
This form is presently output by SPARS to 
tell the user about the characteristics of pro- 
file flight paths. An example showing Verbal 
printout format appears in Figure 1. 

In Data Block printout format, information 
was presented as a set of data. Changes in the 
flight path were sequenced from left to right 
across the page. Position and alphabetic code 
indicated meaning. An example showing Data 
Block printout format appears in Figure 2. 

In Eidoform printout format,’ information 
was presented on computer-produced, maplike 
printouts designed to create an image of each 
profile flight path. An example showing Eido- 
form printout format appears in Figure 3. 


2The term Eidoform is derived from the Greek 
word eidolon meaning image. The author is indebted 
to G. M. Wattenbarger for this term. 
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PROF ILE 


PROF ILE 


PROF ILE 


NEW PROFILE 


2 
TIME HEADING CHANGE AND RATE 
07M30S + 40 DEGREES FAST 
INITHAL ALTITUDE FOR ANY FLIGHT USING PROFILE 2 
SHOULD BE AT LEAST 100 BUT NOT MORE THAN 99900 FEETe 
INITIAL SPEED FOR ANY FLIGHT USING PROFILE 2 
SHOULD BE AT LEAST O BUT NOT MORE THAN 2047 KNOTSe 


SPEED CHANGE AND RATE 


3 
TIME SPEED CHANGE AND RATE HEADING CHANGE AND RATE 
O7M30S + 120 KNOTS FAST 
22M30S + 4O DEGREES FAST 


INITIAL ALTITUDE FOR ANY FLIGHT USING PROFILE 3 
SHOULD BE AT LEAST 100 BUT NOT MORE THAN 99900 FEETe 
INITIAL SPEED FOR ANY FLIGHT USING PROFILE 2 

SHOULD BE AT LEAST 0 BUT NOT MORE THAN 1927 KNOTSe 


4 
TIME SPEED CHANGE AND RATE HEADING CHANGE AND RATE 
12M30S + 25 DEGREES FAST 
20MO00S 


INITIAL ALTITUDE FOR ANY FLIGHT USING PROFILE 4 
SHOULD BE AT LEAST 100 BUT NOT MORE THAN 84900 FEETe 
INITHAL SPEED FOR ANY FLIGHT USING PROFILE 4 

SHOULD BE AT LEAST 0 BUT NOT MORE THAN 2047 KNOTSe 


5 

TIME SPEED CHANGE AND RATE HEADING CHANGE AND RATE 

10M00S = 200 KNOTS FAST - 90 DEGREES FAST 

15MO00S #1000 KNOTS FAST +179 DEGREES FAST 
1HOOMOOS +179 DEGREES FAST 


INITIAL ALTITUDE FOR ANY FLIGHT USING PROFILE 5 
SHOULD BE AT LEAST 20100 BUT NOT MORE THAN 99900 FEETo 
INITIAL SPEED FOR ANY FLIGHT USING PROFILE 5 

SHOULD BE AT LEAST 200 BUT NOT MORE THAN 1247 KNOTSe 


PROFILE 6 
TIME SPEED CHANGE AND RATE = HEADING CHANGE AND RATE 
07M30S 
12M30S + 55 DEGREES FAST 
20M00S = * 120 KNOTS FAST 
27M30S + 80 KNOTS FAST 
33M45S + 100 KNOTS FAST 


PROFIL 
NUMBER 


2 


5 NE 


INITIAL ALTITUDE FOR ANY FLIGHT USING PROFILE 6 
SHOULD BE AT LEAST 100 BUT NOT MORE THAN 89900 FEETe 
INITIAL SPEED FOR ANY FLIGHT USING PROFILE 6 

SHOULD BE AT LEAST O BUT NOT MORE THAN 1747 KNOTSe 


Fic. 1. Example showing Verbal printout format. 


ALTITUDE CHANGE AND RATE 


ALTITUDE CHANGE AND RATE 


ALTITUDE CHANGE AND RATE 


+15000 FEET FAST 


ALTITUDE CHANGE AND RATE 
=20000 FEET FAST 


ALTITUDE CHANGE AND RATE 
+10000 FEET FAST 


ESS INDWLAG FIRST SECOND THIRD FOURTH FIFTH SIXTH 
LIMITS CHANGE CHANGE CHANGE CHANGE CHANGE CHANGE 
A 100 T O7M305 
99900 S 
iS) 0 H +40 F 
2047 A 
A 100 T O7M30S T 22M30S 
99900 S wlZo le & 
S 0 H H +40 F 
LOZ A A 
A 100 T 12M30S T 20MOO0S 
84900 S S 
S 0 H e2D 0 et 
2047 A A +15000F 
W A 20100 T 10MO00S T 15MO0S T1HOOMOOS 
~ 99900 Soe =2 005k IS es 1O0CRE Ss 
S 200 H So Olea lait joa ee ott, Oe 
1247 A -20000F A A 
A 100 T O7M30S T 12M30S T 20M00S T 27M30S T 33M455 
89900 S S Sy l740) la S +30 F "SS". +0607 'F 
5) 0 H H +519) ast H H 
1747 A +10000F A A A A 


Fic. 2. Example showing Data Block printout format. 
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PROFILE PRINTOUTS 


— ee ee ee ee 


EXPLANATION 


PROFILE FLIGHTS MOVE FROM LEFT TO RIGHT ACROSS THE PAGEe WHEN SYMBOLS 

STOP» IT DOES NOT MEAN A FLIGHT GENERATED WITH THIS PROFILE WOULD STOP 
HEREe THE FLIGHT WILL CONTINUE IN A STRAIGHT LINE UNTIL YOU STOP IT OR 
UNTIL IT LEAVES SECTOR RADAR COVERAGE. 


INITIAL ALTITUDE AND SPEED LIMITS APPEAR ABOVE EACH PROFILEe 


CHANGES TO PROFILES ARE EXPLAINED AS FOLLOWS 


CODE MEANING 
T TIME OF CHANGE EXPRESSED IN HOURS»MINUTESsSECONDS (HMS) 
Se SPEED CHANGE IN KNOTS AND RATE FAST (F) OR SLOW (S) 
HD HEADING CHANGE IN DEGREES AND RATE FAST (F) OR SLOW (S) 
AL ALTITUDE CHANGE IN FEET AND RATE FAST (F) OR SLOW (S) 


CODE APPEARS IN PROFILE FLIGHT PATH WHEN THE CHANGE OCCURSe ASTERISKS 
(#####) MARK THE PATH THE FLIGHT FOLLOWS EXCEPT WHEN CHANGES OCCUR. 


THE TIME AND SIZE OF EACH CHANGE APPEAR AT THE POINT ON THE PROFILE 
WHERE THE CHANGE OCCURS. 


PROFILE 2 
o-------- INITIAL LIMITS 
ALTITUDE FROM 100 TO 99900 FEET 
SPEED FROM 0 TO 2047 KNOTS 
T 07M30S 
HD +40F 
HH HHH HHD + 
* 
* 
* 
* 
* 
* 
* 
* 
* 
0 10 20 30 40 50 60 
TIME IN MINUTES 
PROFILE 3 
<<a me INITIAL LIMITS 
ALTITUDE FROM 100 TO 99900 FEET 
SPEED FROM O TO 1927 KNOTS 
T O7M30S T 22M30S 
SP +120F HD +40F 


FEI HEHEHE S P+ et FE HE EE HE HE HEH D + 


0 10 20 30 40 50 60 
TIME IN MINUTES 


Fic. 3. Example showing Eidoform printout format. 
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Length of printout required to present a 
given amount of information is interesting. 
Consider the relative approximate number of 
inches required to display information for the 
first nine profiles. Data Block format is al- 
most 40% shorter than Verbal format, which 
is presently output by SPARS. Eidoform 
format is approximately 300% longer than 
Verbal format (Data Block = 5.5 inches, 
Verbal = 9.0 inches, Eidoform = 27.0 inches). 


M&rASURES OF PRINTOUT EFFECTIVENESS 


In order to learn as much as possible from 
this field study, both information and attitude 
measures were elicited. Effectiveness measures 
were developed to determine how well Ss could 
use each printout format and what Ss thought 
about the formats. Types of measures which 
resulted were (a) sets of study questions and 
(0) structured interviews. 


Study Questions 


Twenty basic information questions were 
designed, Half required specific flight selection 
to obtain the correct answer. The other half 
required that data about a number of profiles 
be grouped to obtain the correct answer. Three 
equivalent sets of study questions (Form A, 
Form B, and Form C) were then constructed. 
Each set contained 20 multiple-choice items. 
Kach item had four alternative answers. Sets 
of questions were pretest-reviewed by seven 
user-judges. Because work-limit pretest results 
did not furnish sufficient variance, a time-limit 
administration procedure was developed. Sets 
were deemed equivalent because of (a) the 
method of set construction and (0) the results 
of the pretest review. 


Structured Interviews 


In order to obtain complete data in a field 
setting, every effort was made to conduct 
interviews quickly. Previous experience in a 
similar setting led to the decision to construct 
an objective, structured-interview blank which 
would combine both rating and scaling meas- 
urement techniques. In this way attitude data 
could be collected rapidly and objectively. 
Answers to the following four questions were 
elicited. (1) Which type of printout did you 
like best? (2) Which type of printout did you 
find easiest to read? (3) Which type of print- 


GtoriA LAUER GRACE 


out would be your choice to use for selecting 
profiles? (4) Which type of printout would 
be your choice to use for summarizing data 
about profiles? 

Special comments about each printout for- 
mat were also solicited after the scaled ratings 
for each of the above questions were obtained. 
The Ss were encouraged to make suggestions. 


PROCEDURE 


Data were collected during a half-day session at the 
Phoenix Air Defense Sector. Answers to sets of study 
questions were obtained in a group setting. Interviews 
were conducted individually immediately following 
the group session. 

The following technique was used to present in- 
structions to Ss. A written briefing guide containing 
samples of printouts and study questions was handed 
to each S. An oral briefing explaining the information 
contained in the printouts and describing the task to 
be accomplished by each S during the experiment was 
presented by the investigator. The briefing was fol- 
lowed by a short question and answer period. 

During the experimental group session, the in- 
vestigator and two colleagues? monitored Ss while 
they answered sets of study questions. The first 
printout format and set of study questions were col- 
lected before the second pair was handed out. Simi- 
larly, the second printout format and set of study 
questions were collected before the third pair was 
handed out. 

At 5-minute intervals, Ss were requested to insert 
the stated time beside the study question on which 
they were then working. In this way, differential 
work rates could be determined if the 20 minutes, al- 
lowed Ss to complete all items for every set of study 
questions, allowed all Ss to finish any task. 

After all printouts and sets of study questions were 
completed, Ss were given a brief coffee break. Inter- 
views were scheduled and Ss requested to report at 
specified times for brief individual interviews. The 
structured format of the interview form made the 
use of multiple interviewers possible. In order to 
assure completion of all interviews in less than 2 
hours, four interviewers were used. 


RESULTS 


Measures of information and attitude were 
obtained to determine printout effectiveness. 
Information was measured by scores on sets of 
study questions (Score = Rights — 4 Wrongs). 
In this way scores became a function of 
speed and accuracy with a correction for 
chance behavior. Attitudes were measured by 


8 The author gratefully acknowledges the assistance 
of two colleagues, Elene B. Maginnis and H. M. Pool, 
who aided in making arrangements and collecting data 
in the field-site setting chosen for this study. 
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ranks and ratings obtained during individual 
interviews. Measures which were calculated in- 
cluded (a) frequency of choice, (8) rank- 
weighted choice, (c) rating-weighted choice, 
(d) rank-weighted and rating-weighted choice. 
Selected comments about printout formats 
were of interest and will also be reported. 


Information Measures 


Results from the sets of study questions 
will be presented. Raw scores computed for 
each S for each trial ranged from 0 to 190 
(N = 23). 

Experimental design sequence requirements 
were fulfilled for the first 18 randomly se- 
lected Ss. Only data from these Ss were used 
to perform other information measure analy- 
ses. This group of 18 Ss is hereafter referred 
to as the experimental block of Ss as appears 
in Table 1. 

First consider the effect of printout format. 
The difference between mean scores for Verbal 
and Data Block printout formats was not sig- 
nificant. Mean score for Eidoform printout 
format was significantly lower than mean score 
for Data Block printout format (t = 6.96, 
p< .01) and Verbal printout format (¢= 
4.13, p< .01). Next consider the effects of 
practice. Mean score was highest on the third 
trial, lowest on the first. The difference be- 
tween mean scores for the first and second 
trials was significant (¢ = 2.13, p < .05); for 
the first and third trials, also significant (¢ = 


TABLE 1 


ToTAL MEAN SCORES FOR PRINTOUTS, TRIALS, AND 
STUDY QUESTIONS FOR THE EXPERIMENTAL 
BLock OF SUBJECTS 





Source M Score 

Printouts 

Verbal 12.9 

Data Block 13.8 

Eidoform 6.7 
Trials 

First Fg.4 

Second 11.8 

Third 13.1 
Study questions 

Form A 11.9 

Form B 10.6 

Form C 10.9 

Note.—N = 18. 
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TABLE 2 


MEAN SCORES ON STUDY QUESTIONS FOR EACH TRIAL 
FOR THE EXPERIMENTAL BLOcK OF SUBJECTS 








Trials 
Printout. | |_-—-ARoa9ao I 
First Second Third 
Verbal 7.0 15.8 15.8 
Data Block 1D ei 13.3 15.3 
Eidoform Set 6.2 8.2 





Note.—N = 18. 


2.87, p< .01); for second and third trials, 
not statistically significant. None of the dif- 
ferences between mean scores for Form A, 
Form B, or Form C of the study questions 
were statistically significant. 

Table 2 shows mean scores for Verbal, Data 
Block, and Eidoform printout formats for the 
first, second, and third trials. For the first 
trial, Data Block mean scores were signifi- 
cantly higher than both Verbal mean scores 
(t = 2.29, p < .05) and Eidoform mean scores 
(¢ = 4.35, p< .01). Verbal and Eidoform 
mean scores were not significantly different. 
For both the second and third trials, differ- 
ences between Verbal and Data Block mean 
scores were not significant. For the second 
and third trials, Eidoform mean score was sig- 
nificantly lower than Verbal mean score (¢ = 
7.06, P< "013° = 4.25; p =< -.01) and Data 
Block mean score (¢= 4.93, p< 01; ¢= 
5.507 DP Ol) 

A different statistical tool, analysis of vari- 
ance, was then applied to the data. A simple 
two-part analysis of variance was applied to 
study-question set scores. Each group repre- 
sented a common treatment condition in 
which three Ss experienced the same printout 
and trial conditions. Measures for 18 different 
treatment conditions were obtained. Significant 
differences between groups were found (F = 
5.90, df = 17, p < .01). Treatment conditions 
were thus determined to be different. 

Next the interaction problem was tackled. 
The results from an application of analysis of 
variance to the Latin-square design were ob- 
tained. The only significant source of varia- 
tion was printout format (F = 13.57, df = 2, 
p < .01). Neither sequence of presentation of 
printouts, study-question sets, nor interactions 
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contributed statistically significant amounts to 
the total variance obtained. Sequence of print- 
out presentation and study-question forms 
were controlled experimentally. Using the 
Latin-square design, these two factors were 
shown not to contribute significantly to total 
variance. 

An additional application of analysis of 
variance to the data was made to determine 
directly the relationship between practice and 
printout format. Printout formats (F = 23.96, 
adj = 2, p <-01)> trials =( = 9123 aaj 82) 
p < .01), and the interaction between print- 
out format and trials (F = 3.28, df=4, p< 
05) all accounted significantly for parts of 
the total variance obtained. 


Structured Interviews 


Interview data were analyzed in a number 
of different ways. Attitudes toward Eidoform 
printouts were generally less favorable than 
were attitudes toward Verbal or Data Block 
printouts. 

Substantive interview results are sum- 
marized in Table 3. Rank choices for printouts 
were weighted (three for first choice, two for 
second choice, and one for third choice) and 
summed. Chi-square tests were performed 
which revealed that values for Question 2 and 
Total differed significantly. 

During individual interviews Ss made in- 
teresting comments about printout formats. 
(a) Verbal printouts elicited the following 


TABLE 3 


WEIGHTED RANK TOTALS FOR COMBINED CHOICES 
FOR EACH PRINTOUT FORMAT BY 
INTERVIEW QUESTION 








Printout Format 








Question 
Ver- Data Eido- 
bal Block form 
1. Like best : Sil 53 34. 
2. Easiest to read 53 BS Sai 
3. Best for selecting 46 47 45 
4. Best for summarizing 46 55 oy 
Total 196 208 148** 
Note.—N =23. Italics within the table indicate compari- 
sons where significant differences were obtained. 
*> <.05, 
wD < 01, 
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comments: “For extracting information, it 
appears quite cluttered.” “Easiest to read of 
the three.” “Too much information in too 
small a place.” “Good for recording from it.” 
“Hardest to understand.” “Hard to visualize 
a course from it. Otherwise very easy to use 
and understand.” “Printing was congested.” 
(6) Data Block printouts elicited the follow- 
ing comments: “To pull out a specific piece 
of information, quite easy to use.” “Because 
of the space it takes up, more information on 
less paper—makes comparison better.” “Good 
for recording from it.” ‘Gave best overall 
picture.” ‘Worked a lot better—go left to 
right in a fast glance.” “Well written and ar- 
ranged.” ‘“Too much scanning across full width 
of paper.” (c) Eidoform printouts elicited the 
following comments: “For a problem designer, 
that form is best since he can partially visual- 
ize the effect he wants.” “When more than one 
change, got cluttered up.” “Takes a lot of 
time. Good for debriefing.” ““A mess. Could be 
O.K. if color coded.” “As for headings, it was 
easier to read, but the heading, speed and 
altitude changes as far as rating is concerned 
take longer.” “Good visual aid. A little bit 
cluttered.” “Changes so close together make 
(it) hard to read.” 

To summarize, structured-interview results 
indicate a general preference for Data Block 
printout format. Eidoform printout format 
tends to be least preferred. Comments about 
the various types of printout formats also 
seem to support these findings. 


DiIscussION 


Results disclose a number of interesting 
facts, some of which are significant not only 
statistically, but also in terms of implications 
for effective printout design. This study has 
clearly demonstrated that printout format sig- 
nificantly affects printout effectiveness as 
measured by this study. Neither sequence of 
presentation, nor sets of ‘study questions, nor 
any of their interactions could account sig- 
nificantly for differential performance on the 
part of Ss. The structure of the stimulus 
(printout format) which was presented to a 
viewer was shown to affect his response sig- 
nificantly. 

What about the effects of practice? Evi- 
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dence relative to treatment conditions indi- 
cated that the combination of printouts, trials, 
and study questions produced significant dif- 
ferences in performance. Perhaps some of the 
striking results attributed to printout format 
were due to practice. Not so. When the effects 
of printout format were extracted from prac- 
tice effect, printout format still contributed 
overwhelmingly to total variance. The effects 
of printout format (F = 23.96, p < .01) and 
trials (F = 3.28, p < .05) were both signifi- 
cant. However, the relative size of these two 
F ratios precluded attributing results largely 
to practice. 

Both information and attitude measures 
point toward the superiority of the Data 
Block printout format. Mean performance 
scores were highest for the Data Block print- 
outs, where information was structured in a 
concise, stylized manner. Data Block findings 
were even more significant since the Verbal 
printout format, for which scores were second 
highest, was already available at field sites. At 
least three Ss had some slight previous ac- 
quaintance with Verbal printout format in an 
operational setting. Even so, Data Block 
printout format resulted in higher mean scores. 
This effect was particularly pronounced on 
the first trial where Data Block mean scores 
were significantly higher than both Verbal 
and Eidoform mean scores. 

Similar results were generally obtained for 
Verbal and Data Block printout formats. Al- 
though most information in the Verbal print- 
out was presented in phrases and sentences, 
time and speed, heading and altitude changes 
were presented in tabular format. Had all 
information been presented in nontabular for- 
mat, perhaps additional significant differences 
between Verbal and Data Block printouts 
might have been obtained. 

The visually oriented Eidoform printout 
format, while preferred by certain Ss, re- 
sulted in significantly lower mean scores than 
Data Block for all trials, and than Verbal on 
all trials except the first. Slight evidence was 
obtained to support the preference for Eido- 
form printout format (a) if they were to be 
used for briefing or selection purposes, or (5) 
if information about heading was desired. 
Eidoform was three times longer than Verbal 
and five times longer than the Data Block. 


449 


Meaningful brevity seemed to play an im- 
portant role in printout effectiveness. 

Interview findings reflected Ss’ attitudes to- 
ward the different printout formats. The Ss 
tended to like Data Block printout format best 
and Eidoform printout format least. Data 
Block and Verbal printout formats were con- 
sidered about equally easy to read; both were 
considered easier to read than Eidoform print- 
out format, which was considered remarkably 
good for selecting profiles. Data Block print- 
out format was considered best for summariz- 
ing data. 

Considering the summed responses to all 
four questions, Data Block printout format 
received the highest score; Verbal printout 
format, a slightly lower score; and Eidoform 
printout format, much less than Data Block 
score. The same relationship among printout- 
format effectiveness was derived from inter- 
views as was derived from answers to sets of 
study questions. Attitude measures confirmed 
information measures. 

After initial familiarization with selected 
printouts, the same trends were obtained from 
structured-interview data covering attitudes 
toward design features as were obtained from 
sets of study questions which objectively 
measured printout communication effective- 
ness. Perhaps the structured interview alone 
could yield fruitful measures of design effec- 
tiveness. This finding requires further study 
before conclusions regarding the relative merit 
of information and attitude measures of de- 
sign effectiveness can be drawn. 


CONCLUSIONS 


A feasible method which can provide em- 
pirical answers to design questions was de- 
veloped as a result of this study. Sufficient 
sample size and satisfactory experimental con- 
ditions were obtained at a field site. The ex- 
perimental design appears suitable for future 
field-site design studies. Data Block printout 
format was determined to be most effective. 
The Ss scored highest in response to questions 
about Data Block printouts and tended to 
like them best, particularly when they were to 
be used for summarizing data. Verbal print- 
outs were less effective than Data Block 
printouts; however, this difference was not 
statistically significant. The Ss tended to like 
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Verbal printouts slightly less than Data 
Block printouts. Eidoform printouts were 
least effective. The Ss scored lowest in re- 
sponse to questions about Eidoform printouts 
and tended to like Eidoform printouts least. 
Only if profiles were to be selected or if ques- 
tions about profile headings were to be an- 
swered did Ss begin to respond favorably to 
Eidoform printouts. Effects of sequence of 
presentation of printout format and study- 
question forms were not significant, either 
alone or in combination. Practice effect was 
obtained but did not alter the overwhelming 
significance due to printout format. Trends 
observed from information measures were 
confirmed by attitude measures. 

Results of this study yield implications for 
the design of future printouts. (a) Structure 
the printouts in meaningful tabular format. 
Both information and attitude results support 
the conclusion that tabular structure yields 
greatest printout effectiveness. (b) Present 
the information concisely. Clarity and brevity 
improve communication effectiveness. Pro- 
vided all necessary information is included, the 
briefer the printout, the greater the printout 
effectiveness. (c) Eliminate unnecessary noise. 
Redundancy, either English-language or sym- 
bolic redundancy, acts as noise which hinders 
use of printouts. (d) Consider printout use. 
The particular use to be made of a printout 
should influence design decisions. For example, 
if briefing information about profile flight 
paths were the only use to be made of the 
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printouts in this study, Eidoform format 
might still be the most desirable. 
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Contrasting high (N=156) and low (N=156) criterion groups of United 
States Public Health Service physicians were identified on the basis of spon- 
taneous comments about personal characteristics appearing in supervisors’ ef- 
ficiency reports. The 2 groups were compared on personality inventories and 
other measures. Significant group differences (.10 level or below) were found 
on personality inventory scales, an employment selection interview, scores 
derived from a regression equation for the California Psychological Inventory 
found to be predictive of performance in medical school, scored sections of 
supervisory efficiency reports, and in attitudes about the employment situation. 
The groups did not differ on measures of aptitude, achievement, creativity, and 
values. Descriptions of the contrasting groups were developed from the dis- 
criminating personality inventory scales. The type of personality criterion em- 


ployed can be used with other occupational groups. 


What constitutes “personal effectiveness” 
among a professional group such as physi- 
cians? In the present study, a personality 
criterion of a kind of personal effectiveness has 
been developed. Essentially, the criterion in- 
volves use of volunteered comments by super- 
visors in efficiency reports on physicians who 
are Commissioned Officers in the United 
States Public Health Service (USPHS). Per- 
sonally effective physicians are those con- 
sistently and spontaneously described as hav- 
ing pleasing personal characteristics. The use 
of spontaneous comments, rather than sys- 
tematic ratings on personality scales, avoids 
the usual halo problem inherent in ratings. 
Spontaneous comments over time produce a 
record amenable to quantification, which per- 
mits a check on the consistency of personality 
functioning as observed by different judges. 
The consistency of the longitudinal record as 
well as the spontaneity of comments suggest 
that the characteristics being observed are 
highly salient aspects of personality. 

The criterion of personal effectiveness in 
medicine developed in this study was intended 
to be a personality measure independent of 
technical or professional competence. Scores 
on personality inventories and other measures 
are analyzed against the personality criterion. 

Personality studies of physicians at either 
a training level or at the level of professional 
practice have received relatively little atten- 
tion. Studies of physicians in practice have 
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not focused on personality variables (Peter- 
son, Andrews, Spain, & Greenberg, 1956; 
Price, Taylor, Richards, & Jacobsen, 1963). 
At the residency level, results have been 
tentative (Fox, 1962) or not very promising 
(Goldstein & Salzman, 1962; Holt & Lubor- 
sky, 1958). At the level of the internship, no 
studies have, to the author’s knowledge, in- 
volved personality measurement. 

In medical school selection, concern has 
been primarily with traditional measures of 
aptitude and intellectual achievement such as 
the Medical College Admission Test (MCAT) 
and premedical grades (Gough, Hall, & 
Harris, 1963). The relative emphasis given to 
achievement variables is evident in the Gott- 
heil and Michael (1957) review. 

More and more, however, nonintellective 
measures are receiving attention. Attitudinal 
and other variables associated with the effects 
of medical education have been studied 
(Gordon & Mensh, 1962; Hammond & Kern, 
1959; Horowitz & Williams, 1963). Other in- 
vestigations have involved biographical or in- 
terest data (Jarecky & Johnson, 1962; Kelly, 
1957; Reissman, 1960). Schumacher (1964) 
has reported that personality characteristics 
at the time of entry into medical school are 
related to later choice of medical career. 

The prediction of medical school perform- 
ance from personality measures has generally 
not been successful (Gottheil & Michael, 1957; 
Liske, Ort, & Ford, 1964). The MMPI has 
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been one of the more frequently investigated 
inventories, but results have varied (Fields, 
1958; Glaser, 1951; Knehr & Kohl, 1959; 
Schofield, 1953). The California Psychologi- 
cal Inventory (CPI) in one study gave a 
predictive validity of .46 in a cross-valida- 
tion sample (NV = 63) in which a 4-year 
grade-point average was used as the criterion 
(Gough & Hall, 1964). 

The paucity of personality information on 
a professional group such as physicians and 
the generally poor predictive results in the 
studies which have been made suggest the 
need for criterion research. Clarification of 
what constitutes success in the personality 
area within the medical profession is es- 
sential to any attempt at personality screen- 
ing at the medical school level as well as at 
later training or professional levels. 

Importance of the criterion in personality 
selection research is emphasized in a recent 
review article which points out that criterion 
measures are typically inappropriate (Guion 
& Gottier, 1965). The failure of personality 
measures to have general validity for selection 
decisions and the need for validation in each 
specific situation are discussed. 

The present study, although it may have 
selection implications, was primarily criterion 
research aimed at clarification of the per- 
sonal attributes of physicians who might be 
considered personally effective. 


METHOD 


The study involved a comparison of extreme cri- 
terion groups of USPHS physicians on personality 
inventories and other measures. 

Physicians in the USPHS may be employed in 
medical or clinical care, research, public health and 
preventive medicine, or administration. The work may 
be with state and local officials, other departments of 
the federal government, or at an international level. 
Public Health Service physicians must be able to 
adjust to a wide range of assignments, deal with a 
variety of people, and at times accept assignments 
which may be considered hardship posts. The work 
may be similar to that of salaried physicians in many 
organizations. 

From among such physicians, personality criterion 
groups were identified on the basis of supervising 
physicians’ comments in efficiency reports. The high 
or personally effective group was one about which 
favorable personality observations had been made, 
while the low group was one about which comments 
had been derogatory. High and low groups were 
matched on age and other variables which might 
affect results. 
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Identification of Groups 


The official efficiency report used in the USPHS 
was the primary source of information for identifica- 
tion of criterion groups. Two open-end questions in 
the report elicit comments concerning a physician’s 
performance: 


“Does this officer have any defects or handicaps 
which might limit his effectiveness 2?” 

“Does this officer have any characteristics that 
make him unusually effective 2?” 

Samples of comments volunteered in response to 
these questions are as follows: 

“He tends to irritate others by his behavior of ag- 
gressiveness and wordiness.” 

“His biggest drawback is occasional difficulty in 
handling his co-workers.” 

“His personality problems interfere with adequate 
utilization.” 

“Has been receiving psychiatric assistance.” 

Other sample comments which supervisors volun- 
teetakes 

“Ability to meet people of all categories and form 
good relationships.” 

“Able to gain cooperation of others.” 

“Stable, mature, well-balanced personality.” 

“Gets along well with others.” 


While there are undoubtedly differences in the 
willingness with which raters volunteer such com- 
ments, and while such comments may sometimes 
represent a clash between two personalities or a 
meshing of two, the use of spontaneous information 
appeared to be an unexplored way of identifying 
criterion groups. 

Criterion groups were identified from (a) coded 
personality comments in the Commissioned Officers’ 
Efficiency and Progress Report (COEPR), and (b) 
a judgment of criterion placement for physicians on 
whom COEPR codes were inconsistent; this judg- 
ment was based on all information available in 
personnel files. 

The COEPR coding system involved a 5-point code 
from 1 (highly derogatory) to 5 (highly favorable) 
for the personality comments; a manual for the 
coding was developed. 

COEPRs collected annually from 1957 through 
1961 and for 1963 and certain special reports, such 
as promotion reports, were coded. Personality com- 
ment codes were listed for physicians on duty in 
January 1964. From the lists, codes were reviewed 
for consistency of the longitudinal record. Physicians 
for whom reports contained no personality codes and 
those for whom reports were primarily coded 3 
(neither extremely favorable nor derogatory) were 
dropped from the study. Review of codes allowed 
for tentative classification into high and low cri- 
terion groups. 

The personnel files of physicians in the low group 
and the files of those who had inconsistent codes 
were reviewed. Physicians on whom there was evi- 
dence of genuinely contradictory information such as 
disagreements by different supervisors were eliminated 
from the study. 

The reliability of high-low group placement was 
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checked by having independent judges review per- 
sonnel files, placing physicians into high or low 
groups. For various checks, the percentage of agree- 
ment with the original judge was no lower than 
91%, 

Each member of the low criterion group was 
matched with a member of the high group on the 
following variables: (a) rank, (b) Corps (Regular, 
Reserve), (c) medical specialty, (d) sex, (e) age 
(within a 10-year range), (f) year in which Selec- 
tive Service obligation had been or would be ful- 
filled, (g) type of assignment (domestic versus foreign 
and, where possible, the bureau and station to which 
assigned in January 1964), and (hk) type of work 
such as medical care, public health, research, or ad- 
ministration as stated in the most recent efficiency 
report. 


Selection of Variables for Study 


An attempt was made to select personality in- 
ventories (@) used in research on medical students 
or on members of other professional groups, (0) 
representing different “types” of instruments, (c) 
feasible for self-administration, and (d) which 
would yield a large amount of information for the 
testing time (4-6 hours). 

The inventories selected were: 

1. California Psychological Inventory (CPI) 

2. Adjective Check List (ACL) 

3. Minnesota Multiphasic Personality Inventory 
(MMPI) 

4. Survey of Interpersonal Values (SIV) 

5. Medical Preference Inventory (MPI) 

6. Barron-Welsh Art Scale 

7. Study of Values 

8. Biographical-Personal Inventory (BPI) 


Most of the inventories are commercially avail- 
able although the MPI is still in a developmental 
stage (Gough, 1952). The BPI has been used ex- 
perimentally within the USPHS.1 

Information available from files and other sources 
permitted the inclusion of other variables such as 
aptitude (MCAT scores) ,2 achievement (professional 
examination in medicine used by the USPHS in Re- 
serve Corps selection), an interview used in Reserve 
selection, and a measure of work satisfaction (the 
Service Evaluation Questionnaire, SEQ, completed 
in 1961 by physicians on duty at that time). 

In addition to the high-low dichotomy resulting 
from the coding of narrative comments in COEPRs, 
scores from the COEPR were available for analysis.? 


1 Permission for adaptation of the Army’s Bio- 
graphical Information Blank is gratefully acknowl- 
edged. 

2 Appreciation is expressed to the Association of 
American Medical Colleges for making available the 
Medical College Admission Test scores. 

8 Consideration was given to other kinds of feas- 
ible criteria, particularly peer ratings. The COEPR, 
however, was developed against a work associates’ 
rating criterion (Newman, Howell, & Harris, 1957). 
In studies of medical interns, multiple correlations 
between one form of the COEPR and group ratings 
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Three sections in the COEPR yield scores: a forced- 
choice section, a 10-point rating scale section for 
rating 11 traits, and an overall scale for rating work 
performance. Scores on the sections were used sepa- 
rately for the 1963 and the 1964 annual reports. An 
average for each report section and for the average 
of the three scored sections was obtained through 
time, based on all annual reports available from 1960 
through 1964. Across the same period, averages were 
also computed on the following rating scales from 
the section of 11 scales: (a) ability to meet situa- 
tions without emotional upset, and (b) ability to get 
along with others. 


Test Administration and Collection of Data 


Physicians were in numerous geographic locations 
including foreign posts so that inventories were col- 
lected by mail. Instructions for self-administration 
attempted to control some aspects of the testing situa- 
tion. For the total sample (NV = 462), the return was 
83%. The return for the low group was 81%, and 
for the high group, 85%. 


Identification of Samples for Analysis 


Matching of members of the high and low groups 
meant that when one member of a pair failed to 
return tests, that pair could no longer be retained 
in the study. Of the 462 high-low physicians asked 
to participate, 68% (312) constituted the final sam- 
ple, 156 highs versus 156 lows. The samples primarily 
represented clinical care physicians (72%), although 
14% were in public health, 12% were in research, and 
2% were in other areas. 


Analysis of Data * 


For scale scores or total scores on personality in- 
ventories and for COEPR scores, ¢ tests for mean 
differences in the high and low groups were made. 
The SEQ was analyzed by chi-square tests for high- 
low group differences on each of 47 questionnaire 
items. In all mean difference tests, the standard error 
formula for independent samples was used rather 
than the formula for correlated samples. The ob- 
tained ¢ ratios probably represent conservative esti- 
mates of high-low group differences. 

In addition to CPI scale scores, a regression equa- 
tion based on CPI scales predictive of medical school 
performance was applied to the high-low samples 
(Gough & Hall, 1964). Mean differences in the high- 
low groups on the predicted scores provide evidence 
as to the validity of the equation against the per- 
sonal effectiveness criterion. 

On the MMPI, besides high-low group comparisons 
on scale scores, blind clinical sorts of MMPI profiles 
into high and low groups were made.5 


performed separately by intern peers and supervising 
physicians ranged from .61 to .83 (Howell, Cliff, & 
Newman, 1960). 4 

# Quintin Welch and Mildred Fields performed the 
computer aspects of the analysis of data. 

5 Appreciation is expressed to George S. Welsh for 
performing these sorts. 
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TABLE 1 


MEAN DIFFERENCES 


IN HicH-Low Groups ON THE ADJECTIVE CHECK LIST 

















Means and SDs* 
Title t ratios High (V = 156) Low (NV = 156) 
M SD M SD 
Total no. adjectives checked —2,53** 51.47 7.02 53.67 8.28 
Defensiveness 1.39 OY sidh 6.79 56.23 7.69 
No. favorable adjectives checked 1:O7*% 55.26 7.68 53.42 8.72 
No. unfavorable adjectives checked —1.93* 45.62 7.19 47.19 Tel, 
Self-confidence —2.70*** 49.94. 9.84 53.08 10.64 
Self-control AAD *#* 57.27 8.50 52.74 9.60 
Lability —2.17** 47.84 10.35 50.26 9.29 
Personal adjustment ZA0G TH Dons 8.35 53.33 9.14 
Achievement —2.89*"* 57.14 7.91 59.81 8.42 
Dominance —3,52*** 53.87 9.82 57.90. 10.39 
Endurance —1.06 59.52 8.00 60.51 8.43 
Order —1.60 58.62 8.19 60.14 8.54 
Intraception 1/1 0% 61.52 8.98 59.73 8.99 
Nuturance 4.06*** Sele, 8.77 53.01 9.14 
Affiliation 3.44*** 53.73 8.11 50.41 8.92 
Heterosexuality 1.39 48.79 9.49 47.23 10.35 
Exhibition —2.47** 44.14 9.87 47.02 10.68 
Autonomy —5.45*** 43.92 9.51 50.19 10.76 
Aggression —5.38*** 43.63 9.14 49.79 10.99 
Change —1.88* 42.89 9.41 44.93 9.72 
Succorance 0.17 43.75 7.64 43.61 7.38 
Abasement Sposa 48.97 9.65 45.13 9.52 
Deference See ioe 54.49 10.41 47.97 11.40 
Counseling readiness 0.81 49.53 9.33 48.63 10.28 





Note.—Negative ratio indicates higher mean scores for the low group. 
= 10. 


a seer Based on test manual norms with M = SO, 


Se a ane 
KD < "05. 
wk p< .01. 
RESULTS 
Adjective Check List 


The majority of the scales in the ACL pro- 
duced significant mean differences in high-low 
groups as is evident from Table 1.° 

The results in Table 1 show that the high 
physician tends to score higher on the Num- 
ber of favorable adjectives checked, Self-con- 
trol, Personal adjustment, Intraception (at- 
tempting to understand one’s own behavior or 


6 Tables comparable to Table 1 for each of the 
instruments analyzed have been deposited with the 
American Documentation Institute. Order Document 
No. 9095 from ADI Auxiliary Publications Project, 
Photoduplication Service, Library of Congress, Wash- 
ington, D. C. 20540. Remit in advance $1.25 for 
microfilm or $1.25 for photocopies and make checks 
payable to: Chief, Photoduplication Service, Library 
of Congress. 


the behavior of others), and Nurturance (in- 
volving behaviors which extend material or 
emotional benefits to others). He also scores 
higher on the need to seek and sustain numer- 
ous personal friendships (Affiliation), the 
need to express feelings of inferiority through 
self-criticism, guilt, or social impotence 
(Abasement), and the need to seek and sustain 
subordinate roles in relationship with others 
(Deference). 

The low physicians score higher on the 
Total number of adjectives checked, the 
Number of unfavorable adjectives checked, 
Self-confidence, Lability, Achievement, Domi- 
nance, Exhibition, Autonomy, Aggression, and 
Change. 

High-low differences may be considered in 
terms of shortened forms of the scale descrip- 
tions given in the test manual (Gough & Heil- 
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brun, 1965, pp. 5-9).” Only scales yielding ¢ 
ratios significant at the .01 level will be con- 
sidered. Scales on which both groups were at 
the mean of 50 or higher, but on which the 
highly effective physicians scored higher yield 
the following descriptions of the High group: 


—serious, sober, responsive to obligations, 
diligent, practical, loyal, with an element of 
over-control (Self-control). 

—helpful, nurturant, bland, self-disciplined, 
conventional, solicitous (Nurturance). 

—adaptable, anxious to please, ambitious, 
concerned with position, may tend to exploit 
others (Affiliation). 


Scales on which both groups were at the 
mean of 50 or higher, but on which the low 
group scored higher were: 


—assertive, affiliative, outgoing, persistent, 
an actionist, wants to get things done, im- 
patient, concerned about a good impression, 
forceful, self-confident, determined, ambitious, 
opportunistic (Self-confidence). 

—intelligent, hard-working, determined to 
do well, motives are internal and goal-centered 
rather than competitive, may be unduly trust- 
ing and optimistic (Achievement). 

—forceful, strong-willed, persevering, con- 
fident of his ability, direct and forthright 
(Dominance). 


Scales on which both groups were at the 
mean of 50 or lower, and on which the high 
group scored lower furnish the following de- 
scriptions of the igh group: 


—-moderate and subdued disposition, hesi- 
tates to take the initiative (Autonomy). 

—conformist, but not necessarily lacking 
in courage or tenacity, patiently diligent, sin- 
cere (Aggression). 


On one scale, both groups were below the 
mean, but the low group scored lower and may 
be described as: 


—optimistic, poised, productive, decisive, 
not fearing others, is alert and responsive to 


TTIt should be noted that these descriptions apply 
to scores deviating above and below 50. In the 
present study, both samples may be above 50, for 
example, on Dominance, but the low officers are 
higher on Dominance. The personality sketches fur- 
nished are of the more extreme group. 
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them, tempo brisk, manner confident, behavior 
effective (Abasement). 


On one scale, the high group was above the 
mean, and the low group below the mean. The 
high group on this scale is characterized as: 


—conscientious, dependable, persevering, 
self-denying out of a preference for anonymity 
and freedom from stress, attends to his affairs, 
seeking little, and yielding to any reasonable 
claim (Deference). 


The low group is characterized as: 


—energetic, spontaneous, independent, likes 
attention, likes to direct others and express 
his will, ambitious, not above taking ad- 
vantage of others (Deference). 


It may be noted that on the differentiating 
scales (.01 level), the high physicians scored 
highest (one-half sigma above the mean of 
50) on Self-control, Achievement, and Nurtur- 
ance. The low physicians scored highest on 
Achievement and Dominance. 


California Psychological Inventory 


Physicians in both the high and low groups 
averaged above the CPI mean of 50 on all 
scales. This is to be expected in a group with 
better than average education, but the highly 
effective physician was the higher scorer on 
all the scales which yielded significant differ- 
ences in the groups. 

The high physician scored significantly 
higher (.10 level or below) on Social Presence, 
Sense of Well-Being, measures of socializa- 
tion, maturity, and responsibility (the Re- 
sponsibility, Socialization, Self-Control, Toler- 
ance, and Good Impression Scales). He also 
scored significantly higher on Flexibility and 
on measures of achievement potential and 
intellectual efficiency (Achievement via Con- 
formance, Achievement via Independence, 
and Intellectual Efficiency Scales).* He was 
not higher than his low counterpart on many 
of the measures of ascendance and self-assur- 
ance, nor on the Psychological-Mindedness 
and Femininity Scales. 


MMPI 


On the MMPI, low physicians scored sig- 
nificantly higher (.10 level or below) on 


8 See test manuals for definitions of specific scales. 
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Hypochondriasis, Depression, Psychasthenia, 
Schizophrenia, and Hypomania without the 
K correction. The high physicians were sig- 
nificantly higher (.01 level) on the K score 
(control of impulse or defensiveness) and on 
the Ego Strength Scale (.10 level). 

An effort was made to determine whether 
or not placement in high and low groups 
could be predicted from a clinical reading of 
MMPI profiles. The MMPI expert judged a 
sample (NV = 214) of the total number of 
profiles, placing them in highly effective or 
less effective groups. Correct placement was 
56%, a departure from chance at the .10 level. 

Although both statistical comparisons on 
scale scores and clinical sorts showed some 
relation to the criterion, dissimulation may 
have affected results. The expert who per- 
formed the clinical sorts thought about 20% 
of the profiles showed evidence of faking or 
overdefensiveness. 


Values, Social Insight, and Preference Meas- 
ures 


None of the scales from the Survey of In- 
terpersonal Values or from the Study of 
Values differentiated the high-low groups. 
The same is true of the Barron-Welsh Art 
Scale, the Chapin Social Insight Test, and the 
Medical Preference Inventory. 

The Biographical-Personal Inventory 
yielded an empirical scoring key based on 
item-analysis involving the two contrasting 
groups. Information concerning this instru- 
ment is available upon request from the 
author. 


Medical School and USPHS Admission Vari- 
ables 


The high-low groups did not differ on cogni- 
tive measures such as the parts of the Medical 
College Admission Test or the Public Health 
Service professional examination in medicine. 

The high-low groups also did not differ in 
the representation of medical schools attended 
or in whether they had had an internship in 
the USPHS. 

The interview at the time of application to 
the Reserve Corps produced a ¢ ratio signifi- 
cant at the .05 level and in the expected 
direction, with the high group having the 
higher score. This significant difference is 
worthy of emphasis, since interview results 
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have not been promising in the medical school 
situation (Gough, Hall, & Harris, 1963; 
Kelly, 1957). The result is also worthy of 
emphasis since the interview is conducted by 
only one interviewer. An interview score based 
on the average of several interviewers’ re- 
ports could be expected to yield a more re- 
liable score and perhaps produce greater high- 
low differentiation. 

The CPI equation based on four scales 
found to be predictive of medical school per- 
formance (Gough & Hall, 1964) differentiated 
the high-low groups at the .10 level. This 
equation is described as screening out petu- 
lance, self-centeredness, intolerance, and se- 
lecting for personal maturity, concern for 
others, self-confidence, freedom from narcis- 
sistic achievement drives or compulsive striv- 
ing, and personal resourcefulness coupled with 
sensitivity to the needs and demands of others 
(Gough & Hall, 1964, p. 225). 


COEPR Variables 


Although the criterion for high-low cri- 
terion placement was spontaneous comments 
occurring in the COEPR, the high and the 
low groups might be expected to differ also 
on scored sections of the efficiency report. 
That this was the case was evidenced by sig- 
nificant £ ratios (.01 level or below) in all 
high-low comparisons on the COEPR. 

The data showed that: 


1. The high physicians were significantly 
higher than the low ones on all COEPR scores 
and averages of scores. 

2. Averages based on 4 years of reports 
produced greater high-low differentiation than 
averages based on 2 years or on scores from 
single reports. 

3. The scale for rating ‘“‘ability to get along 
with others” generally yielded the best high- 
low differentiation of any part of the COEPR. 
This finding suggests that the spontaneous- 
comments criterion is more similar to ratings 
of “ability to get along with others,” than to 
ratings of overall medical performance or 
“ability to meet situations without emotional 
upset.” 

4. When scores from several reports were 
averaged, a forced-choice section containing 
descriptive adjectives, many of which relate 
to personality characteristics, was as differ- 
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entiating as a score based on “ability to get 
along with others.” 


Service Evaluation Questionnaire 


This questionnaire, used in 1961 to assess 
the attitudes of physicians toward the Serv- 
ice, yielded few significant differences, but 
rated significantly more satisfactory by high 
physicians were: 


1. Quality of program leadership furnished 
you (.05 level). 

2. Professional capabilities of work associ- 
ates (.10 level). 


Rated significantly higher by high physi- 
cians than low as having an effect on a deci- 
sion to stay in or leave USPHS were: 


1. Quality of supervision furnished you 
(.05 level). 

2. Quality of orientation to each position 
held (.10 level). 

3. Information furnished concerning bene- 
fits (.05 level). 

4. Rate at which you were promoted (.10 
level). 


Ratings of the extent (‘too little,’ “too 
much,” “about right”) of efforts made by 
PHS personnel to retain a physician in the 
Service differentiated significantly (.10 level) 
between high and low groups. A higher per- 
centage of high physicians rated “about right.” 

Rated significantly more satisfactory by low 
physicians were: 


1. Potential promotion rate (.05 level). 

2. Economic security provided by pay plus 
benefits and orderly promotion system (.05 
level). 


Rated significantly higher by low physicians 
as having an effect on a decision to stay or 
leave were: 


1. Adequacy of other benefits (.10 level). 

2. Opportunity to participate in community 
and civic affairs (.10 level). 

3. Degree of independence allowed in per- 
forming duties (.10 level). 


Also rated significantly more favorably 
(.10 level) by low physicians was amount of 
time spent in paper work. 

Low physicians, then, did not express 
greater dissatisfaction (rate more aspects 
lower) with the Service than did the high 
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ones. Personality difficulties were not reflected 
in degree of dissatisfaction with the Service 
3 years prior to the study. 

Attitudes toward: particular aspects of the 
Service did differ in that low physicians had 
more favorable attitudes toward economic 
advantages of the Service and less favorable 
attitudes toward program leadership and to- 
ward the professional capabilities of work as- 
sociates. The aspects the low physicians felt 
were important in a decision are difficult to 
interpret since the low physicians did not 
differ from the highs in an appraisal of the 
degree to which these aspects were satisfac- 
tory. 


DISCUSSION 
Personality Description 


This study has served to clarify some of 
the personality characteristics which may be 
associated with what supervisory personnel 
consider the “personal effectiveness” of physi- 
cians working in an organizational setting. 

The personally effective physician is char- 
acterized by greater social presence, sense of 
well-being, socialization, maturity and re- 
sponsibility, flexibility, achievement potential, 
and intellectual efficiency (CPI scales). More 
than his low counterpart (as indicated on 
the ACL), he tends to be serious, sober, re- 
sponsive to obligations, diligent, loyal, helpful, 
self-disciplined, conventional, solicitous, adapt- 
able, concerned with position, moderate, hesi- 
tant to take the initiative, conforming but not 
lacking in courage, sincere, conscientious, de- 
pendable, persevering, and self-denying. He is 
lower than the less effective physicians on 
MMPI measures of psychopathology, but is 
higher on Egostrength and on defensiveness 
or control of impulse (K score). 

The less effective physician is lower on 
social presence, sense of well-being, socializa- 
tion, maturity, responsibility, flexibility, and 
achievement potential (CPI). He is more 
likely (on the ACL) to be assertive, outgoing, 
persistent, impatient, self-confident, deter- 
mined, opportunistic, hard-working, goal-cen- 
tered, forceful, strong-willed, direct, optimistic, 
poised, decisive, energetic, spontaneous, inde- 
pendent, liking of attention, ambitious, and 
not above taking advantage. He is more 
likely to score higher on pathology indicators 
(MMPI) than are members of the high group. 
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Not only is the high group considered more 
personable and likeable according to com- 
ments made in supervisory reports, but super- 
visors also rate the high group higher even 
when presumably rating professional perform- 
ance. The results suggest that the qualities 
prized by supervisors are evident in a 30- 
minute employment interview and may also 
be qualities found predictive of performance 
in medical school. The data also suggest that 
physicians viewed as personally effective have 
more favorable attitudes toward supervisors 
and others in the work situation since they 
are more favorable toward program leadership 
and toward the professional capabilities of 
work associates. 


Personality Prediction 


Personality factors, evidenced by high-low 
criterion differentiation, entered into all of 
the supervisory performance evaluations 
(scored sections of the COEPR). Criterion 
differentiation increased with increases in the 
number of reports. 

These findings suggest the importance of 
personality factors in the evaluation of medi- 
cal performance. It could be argued, how- 
ever, that supervisors are unduly influenced by 
“irrelevant” personality factors in making 
their judgments of the quality of physician 
performance. Another possibility, though, is 
that personality factors are not “irrelevant,” 
but account for some of the variance in per- 
formance in a professional group which is 
both self-selected and screened at successive 
training and employment stages. A third pos- 
sibility is that the COEPR as a performance 
evaluation instrument may elicit an evaluation 
of personality characteristics more than of 
professional or technical behavior. Perhaps the 
ultimate criterion in medicine can be expected 
to be multifaceted, embodying a host of pro- 
fessional competence and personal qualifica- 
tions factors, as suggested by the Utah Study 
(Price et al., 1963). 

The fact that a CPI equation for predicting 
success in medical school significantly differ- 
entiated (at the .10 level) the criterion groups 
in this study suggests that some of the same 
personality characteristics important in train- 
ing are also relevant to the employment situa- 
tion. The findings emphasize the need for 
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additional experimental studies of personality 
factors in medical training. 

One aspect of personality screening which 
may bear a second look is the selection inter- 
view which has been reported not to be useful 
in the medical school situation (Kelly, 1957). 
An interview evaluated against an appropriate 
criterion may have promise in screening. 

The type of criterion employed appears to 
have been a fruitful approach to the identifica- 
tion of groups for personality study. Although 
numbers did not permit a breakdown of the 
data, the content of supervisory comments 
suggested that subsamples identified by dif- 
ferent personality syndromes are probably 
represented in both the high and low groups. 
In some situations, analyses by syndrome 
groups might be feasible. Use of a systematic 
method (such as the ACL) for enumerating, 
but not rating, personality characteristics 
could be one way of obtaining supervisory 
judgments over a period of time which would 
allow identification of syndrome subsamples. 

The MCAT, professional achievement ex- 
aminations in medicine, the Barron-Welsh Art 
Scale, the Chapin Social Insight Test, the 
Medical Preference Inventory, the Study of 
Values, and the Survey of Interpersonal 
Values all failed to differentiate significantly 
between the high and low criterion groups. 

This does not mean that aptitude, achieve- 
ment, creativity, insight, preferences, and 
values may not be important in medical per- 
formance, but it does mean that the measures 
used were not predictive of the personality 
criterion as developed for this study. Perhaps 
with criteria more clearly involving profes- 
sional and technical behavior, the MCAT and 
other cognitive measures would have pre- 
dicted better. 

The obtained results also mean that among 
professional personnel not differentiated on 
aptitude or achievement measures, discrimin- 
able differences do exist.in personality charac- 
teristics. 

On the ACL, 75% of the scales analyzed 
were predictive of the criterion as compared 
with 61% on the CPI, and 33% on the 
MMPI. It is possible that the inventories with 
few implications of psychopathology may have 
been more acceptable in content to the pro- 
fessional group studied. It may also be that 
discriminable personality differences in a highly 
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screened group are likely to be in terms of 
readily acknowledged characteristics. 


Organizational Effect 


The highly effective physician in this study 
is distinguished from his low counterpart on 
measures of such characteristics as maturity, 
socialization, achievement potential, self-con- 
trol, nurturance, abasement, deference, and 
perhaps better psychological functioning. The 
low group gives evidence of greater assertive- 
ness, emotional expression, and_ willfulness, 
but may be characterized by a poorer level 
of adjustment. There was no evidence to 
suggest that the low officers have qualities 
which compensate for their difficulties in the 
work situation. They were not significantly 
different from the high group on creativity, 
aptitude, or medical achievement measures. 

The personality descriptions emerging from 
this study cannot be viewed apart from their 
organizational setting. What is considered 
“personally effective” is clearly a value judg- 
ment. In this study, the value judgment is 
being made by individuals, usually physicians, 
who are in supervisory positions. The charac- 
teristics valued in another organization or im- 
portant in private medical practice may be 
quite different from those which emerged 
from this study. 
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VISUAL ACUITY AS MEASURED BY DYNAMIC 
AND STATIC TESTS: 


A COMPARATIVE EVALUATION ? 


ALBERT BURG 


Institute of Transportation and Traffic Engineering, University of California, Los Angeles 


In order to provide, for the 1st time, definitive information on the relationship 
between static visual acuity and acuity for a moving target (dynamic visual 
acuity), both types of acuity were measured for 17,500 Ss, ages 16-92. The re- 
sults show: (a) acuity declines progressively with both increasing speed of 
target movement and advancing age, (b) males have consistently better acuity 
(both static and dynamic) than females, and (c) high intercorrelations exist 
between the static and dynamic tests, these correlations decreasing with in- 
creasing speed of target movement. These findings are presented primarily for 
their value in providing normative data to other researchers. Additional research 
is suggested to explain some of the relationships obtained in the study. 


In the past decade considerable interest has 
developed in a new measure of visual capa- 
bility called “Dynamic Visual Acuity,” or 
“DVA,” which refers to the ability to dis- 
criminate an object when there is relative 
movement between the observer and the ob- 
ject. This increasing interest is an outgrowth 
of the realization that for many activities, 
such as driving, flying, ball playing, and the 
like, discrimination of moving objects (or of 
stationary objects while one is moving) plays 
a key role and, therefore, that performance 
on a dynamic-acuity test may be more closely 
correlated with task performance than is the 
score obtained on a test of static (or stan- 
dard) acuity. 

Furthermore, research conducted to date 
in this area has demonstrated that it is dif- 
ficult to predict an individual’s DVA score 
from his static-acuity score. In an extensive 
survey of this research, Burg (1964) con- 
cludes that there are marked differences in 
DVA among individuals with essentially the 
same static acuity, and that there is little 
agreement on the exact relationship between 
dynamic and static acuity. Some researchers 
(eg., Crawford, 1960; Miller & Ludvigh, 
1956; Warden, Brown, & Ross, 1945) feel 
there is little or no measurable relationship 


1 Data collection and analysis costs were partially 
borne by the United States Public Health Service 
(Grant AC-00015) and the State of California and 
United States Bureau of Public Roads (California 
Standard Agreement 13600). 


between the two, while other investigators 
(e.g., Burg & Hulbert, 1961; Elkin, 1961; 
Erickson, 1963; Hulbert, Burg, Knoll, & 
Mathewson, 1958) all have found low but sig- 
nificant correlations between static acuity and 
performance on a dynamic task. 

There are several factors that may account 
for the lack of consistency in research re- 
sults. Primary among these are small sample 
size and excessive homogeneity of the sample. 
The purpose of the present study was to com- 
pare static- and dynamic-acuity performance 
utilizing a large, heterogeneous sample, in 
order not only to assess the validity of the 
various research findings obtained heretofore, 
but also to provide performance norms of 
value to other researchers working in this 
area. 

METHOD 


Apparatus 


An earlier report (Burg, 1965) provides a de- 
tailed description of the DVA test apparatus. Briefly, 
a 35mm automatic slide projector, mounted in a 
rotatable cradle driven by a variable speed drive 
motor, projects an acuity target image on a 180- 
degree cylindrical screen 4 feet in radius, The screen, 
made of metal painted flat white, is uniformly il- 
luminated at approximately 7.8 footcandles and has 
a 50% reflectance factor. The subject (S) sits in an 
adjustable height chair directly under the projector, 
so that the pivot point of the projector cradle, the 
focal point of the projector, the center of curvature 
of the screen, and the center of S’s head are all in 
vertical alignment. Figure 1 provides a view of the 
projector framework and screen properly positioned 
relative to each other. 
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apparatus. 


The target used is the Bausch & Lomb Ortho- 
Rater acuity-test checkerboard target (Figure 2). 
Fifteen 35mm slides are used in the projector to 
duplicate the sequence of target sizes (10.0-0.67 min- 
utes of arc for the checkerboard grid) found in the 
original Ortho-Rater, and the projected images sub- 
tend the same visual angles as their counterparts in 
the Ortho-Rater. The target travels a horizontal path 
across the screen from left to right, and its angular 
velocity is variable from approximately 5 to 200 
degrees per second. 

Static acuity was measured using a late-model 
Bausch & Lomb Master Ortho-Rater, which utilizes 
12 checkerboard targets ranging from 10.0 to 0.83 
minutes of arc. (The three smallest targets in the 
original Ortho-Rater sequence, Nos. 13, 14, and 15, 





Fic. 2. Checkerboard target. 
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had been dropped from the acuity test provided in 
the late-model Ortho-Rater used.) 


Procedure 


The S’s (far vision) binocular static acuity (cor- 
rected, if he wore glasses) was measured on the 
Ortho-Rater. The S next was seated in the DVA ap- 
paratus and his eye level brought to a standard 
height by means of the adjustable chair. The S’s 
binocular static acuity then was measured again, this 
time using the 15 acuity slides in the DVA pro- 
jector (which was fixed in a straight-ahead posi- 
tion). As was the case with the Ortho-Rater (and the 
DVA tests which followed), in this “screen static 
acuity” test the targets were presented in sequence, 
from largest to smallest, and S’s score was the num- 
ber (and visual angle) of the last correctly dis- 
criminated target preceding two consecutive misses. 
For each target the position of the checkerboard was 
randomly different from that in the Ortho-Rater test, 
and also was different in each succeeding DVA test. 

For the DVA tests that followed, S was first in- 
formed, as before, of the type of (verbal) response 
required (“top,” “bottom,” “left,” or “right,” to in- 
dicate the position of the checkerboard grid within 
the target). The movement of the target (from left 
to right in the horizontal meridian) was then pointed 
out, and S$ was told that he was free to move his 
head. 

The projector cradle was then set to rotating at a 
constant 60 degrees per second, and when S was 
told “Ready,” the projector lamp and blower were 
turned on, revealing a blank square of light travers- 
ing the target path. As the light square passed off 
the screen to the right, the changer mechanism was 
automatically activated so that as the projected 
image reappeared on the left side of the screen, it 
was that of the largest target (No. 1); the next time 
around Target No. 2 was shown, and so on with 
progressively smaller targets until S had incorrectly 
called the position of the checkerboard on two 
successive targets. 

At the conclusion of the first DVA test, the pro- 
jector was stopped, the slide magazine repositioned 
at the beginning of an alternate set of slides, and 
the above procedure repeated for 90 degrees per 
second, and again for 120 degrees per second. Part- 
way through the study, a DVA test at 150 degrees 
per second was added to the test battery, and ap- 
proximately 28% of the Ss were tested at this speed. 


Subjects 


The Ss were California drivers who were volun- 
tarily participating in a long-range research program 
studying the relationship between vision test scores 
and driving performance. A total of nearly 17,500 Ss 
were tested, 62.8% of whom were male and 37.2% 
female. The age range was from 16 to 92, and static 
acuity ranged from 20/13 to 20/200, The ‘Ss were 
tested at branch offices of the California Department 
of Motor Vehicles scattered throughout the state. 
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TABLE 2 


SUMMARY OF PRoDUCT-MoMENT CORRELATIONS 











Screen | 60°/sec | 90°/sec | 120°/sec} 150°/sec 

Test DVA DVA DVA DVA 

Ortho-Rater | 0.673 0.598 0.541 0.499 0.350 
(9799) | (16923) | (17254) | (17186) | (6629) 

Screen test 0.710 0.634 0.565 0.452 
(9798) | (9796) | (9763) | (6195) 

60°/sec DVA 0.788 0.695 0.591 
(16912) | (16846) | (6612) 

90°/sec DVA 0.765 0.660 
(17193) | (6630) 

120°/sec DVA 0.697 
(6630) 




















Note.—Sample sizes in parentheses. 


RESULTS 


Table 1 presents a summary of the mean 
static and dynamic binocular visual acuity 
scores, by age and sex, and Table 2 gives the 
product-moment correlations among the vari- 
ous static and dynamic tests. Figure 3 pre- 
sents a graphical description of visual acuity 
threshold as a function of age, sex, and target 
movement. 


———— MALES 
FEMALES 





VISUAL ACUITY (MINUTES OF ARC) 


“ STATIC 
ACUITY 








AGE (YEARS) 


Fic. 3. Binocular visual acuity as a function of age, 
sex, and speed of target movement. 
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From inspection of the above the follow- 
ing is evident: 


1. Visual acuity for a moving target is 
poorer than that for a stationary target, and 
acuity becomes progressively worse with in- 
creasing angular velocity of target movement. 

2. There is a progressive decline in acuity 
with advancing age, this decline accelerating 
in the older age groups and becoming more 
pronounced with a moving target than with 
a Stationary target. 

3. Males have a slight but consistent su- 
periority over females with regard to visual 
acuity threshold (whether static or dynamic). 

4. High intercorrelations exist between all 
acuity tests, with the correlations between 
static and dynamic tests decreasing (as ex- 
pected) with increasing target velocity. Also 
as expected, the static-screen acuity test cor- 
relates more highly with the dynamic tests 
than does the Ortho-Rater. 


DIscussION 


None of the results can be considered sur- 
prising, with the possible exception of the 
consistent male superiority in visual acuity 
(the slight reversals in the higher age groups 
are, in all probability, an artifact of the 
small sample sizes). This finding is consistent 
with the results of other studies (e.g., Burg & 
Hulbert, 1961; United States National Center 
for Health Statistics, 1964), and while several 
theories have been proposed, such as differ- 
ential motivation and physiological and/or 
physiognomic differences, no proof of any of 
them has been brought forth, and no attempt 
to explain these differences will be made here. 
This obviously is an area for additional re- 
search, 

The progressive decline in acuity with in- 
creasing speed of target movement is in agree- 
ment with other research findings, and is to 
be expected in view of the complex nature of 
the accommodation-pursuit tracking task in- 
volved, one in which both good resolving 
ability and good coordination of eye and 
neck muscle movements are necessary to per- 
mit a high level of performance. 

The most interesting, and in some ways 
the most revealing finding is the relatively 
high degree of correlation between static- and 


MEASUREMENT OF VISUAL ACUITY 


dynamic-acuity performance. Prior to this re- 
search, the highest correlation ever obtained 
between a standardized static-acuity test and 
a dynamic test was 0.306 (Burg & Hulbert, 
1961), between Ortho-Rater performance and 
a 20-degree-per-second DVA test on apparatus 
similar to the present one. At 60 degrees per 
second, Burg and Hulbert found the correla- 
tion dropped to 0.280, as compared to 0.598 
obtained in the present study. The increased 
correlations found at all target speeds in the 
present study reflect the fact that the present 
study represents the first time that an ex- 
tremely large, heterogeneous group of Ss was 
used. Despite the size of the presently ob- 
tained correlations, however, it is evident that 
nonacuity factors play an important role in 
determining DVA, and that these factors are 
increasingly important in determining DVA 
as speed of target movement increases. The 
exact nature of these factors is still unclear, 
but research is currently underway to in- 
vestigate this area. 

The fact that performance on the static- 
screen acuity test correlates more highly with 
DVA than does Ortho-Rater acuity is, of 
course, a function of its greater degree of 
similarity to the DVA test. For this reason, 
Figure 3 includes static-screen acuity values 
rather than those for the Ortho-Rater, to 
present a more valid comparison of static- 
and dynamic-acuity performance. 

Finally, the progressive decline in both 
static and dynamic acuity as a function of 
age is in general agreement with the results 
of most studies on the effects of aging on 
physical or physiological functions (e.g., 
Chown & Heron, 1965; Weale, 1963; West- 
heimer, 1965). A plot of the log visual acuity 
scores against age produced nearly linear 
functions, and a log-log plot further reduced 
the remaining curvilinearity. However, doubt 
still remains as to whether the relationship 
between acuity and age can be represented 
by a simple power function, or whether some 
more complex relationship is involved. As 
Weale (1963) points out, previous research 
efforts in this area have not produced con- 
sistent results, and all of the aspects of this 
relationship bear careful reexamination. 
Further analysis of the data obtained in the 
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present study is underway, the results of 
which will appear in a future publication. 
The curves in Figure 3 reinforce the suspi- 
cion that there is a considerable divergence 
between chronological age and physiological 
age. Furthermore, there is also a difference 
between visual “efficiency” and visual acuity 
(quite obviously, a person with 20/100 acuity 
does not have only “half” the vision of a 
person with 20/50 acuity). That is, if it were 
possible to establish a scale of visual “effi- 
ciency” (which may, in fact, be a logarithmic 
function, as Snell & Sterling, 1925, suggested), 
and then plot performance on this scale 
against physiological age, the resulting curve 
might very well be a straight line with a 
slope of 1! At the present time, however, one 
can only speculate as to the true relationship 
between vision and age, and this is certainly 
an area where further research is essential. 
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INFLUENCE OF A CHANGE IN SYSTEM CRITERIA 
ON TEAM PERFORMANCE* 


GEORGE E. BRIGGS ann WILLIAM A. JOHNSTON 


Ohio State University 


In a simulated ground-controlled aerial intercept task, 2-man teams of radar 
controllers transferred to either simple or complex criterion conditions following 
training under simple criteria. Upon transfer to simple criterion conditions, 
teams adapted performance rapidly to the new criterion; however, upon transfer 
to complex criteria, teams continued to emphasize that aspect of performance 
appropriate during the previous simple criterion conditions. 


It is not uncommon for personnel to experi- 
ence a change in the criteria, used by a system 
manager to evaluate individual and/or system 
performance, when transferring from a train- 
ing to an operational context or when chang- 
ing assignments within the latter context. 
Further, whereas relatively straightforward 
criteria may be employed during training, 
the greater complexity of an operational situa- 
tion encourages the use of more complex (and 
even incompatible) criteria to evaluate indi- 
vidual and system performance. The purpose 
of this experiment, then, was twofold: (a) to 
determine the influence of transfer from one 
to another criterion condition, and (>) to de- 
termine if transfer to another simple criterion 
produces effects different from those found 
upon transfer to a complex criterion. 

During transfer, the task confronting two- 
man teams was identical to that used by Johns- 
ton (1966) for his transfer task: The team 
members were to coordinate interceptions of 
incoming aircraft for as many pairs of air- 
craft as possible. In this situation, two major 
criteria seemed particularly appropriate on 
which to base experimenter (/)-provided feed- 
back to the teams: (a) the time required to 
make an interception, and (b) the degree of 
coordination between the two team members 
(radar controllers or RCs) in making the in- 
tercepts. These are analogous to the speed 


1 This research was carried out at the Human Per- 
formance Center and was supported in part by the 
United States Navy under Contract No. N61339-1327, 
sponsored by the United States Naval Training De- 
vice Center, Orlando, Florida. Reproduction of this 
publication in whole or in part is permitted for any 
purpose of the United States Government. 


and accuracy criteria which form the bases 
for most evaluations of man-machine perform- 
ance. Further, these are incompatible criteria 
in that one cannot maximize both simultane- 
ously (Howell & Kreidler, 1963). 

On the basis of previous findings (Fitts, 
1966), it was expected that when teams were 
fed back performance evaluations based on 
a single criterion (time or coordination), their 
performance over sessions would change so as 
to maximize, within limits, such feedback. 
Further, it was expected that a change to 
another single criterion condition would re- 
sult in a fairly rapid adjustment in perform- 
ance to maximize feedback based on the new 
criterion. However, in the cases of transfer 
from a simple (single) to a complex (both 
time and coordination) criterion, it was ex- 
pected that teams would modify their per- 
formance so as to reach a compromise with 
the two incompatible criteria. Finally, two. 
forms of feedback were provided in the case 
of transfer to a complex criterion: In one case 
teams received feedback on time and coordina- 
tion as individual items of information, while 
in the other case a single index was fed back 
which was determined by the multiplication of 
time by degree of coordination. This compari- 
son was included to determine if the expected 
compromise in performance under complex 
criterion conditions differs in the case wherein 
the teams have explicit information on the 
temporal and coordinative aspects of per- 
formance as compared with the case where an 
external agent (the Z) predetermines the ap- 
propriate weighting of these two aspects of 
performance and provides only a single “figure 
of merit” as information to the teams. 


467 


468 


METHOD 


Subjects and apparatus. The subjects (Ss) were 
128 undergraduate males who volunteered for service 
in eight 35-minute sessions. They served in two-man 
teams, and assignment to a team was determined 
solely by the time of contact with the experimenter 
(Z). Assignment of teams to groups was random 
with the restriction that groups be filled equally 
across the 9 weeks of data collection. No S had 
served previously in a similar experiment and all were 
paid $10.00 for their service. 

The apparatus was the same as that utilized by 
Johnston (1966): Each team member or radar con- 
troller (RC) viewed his own cathode-ray tube (CRT) 
display on which appeared “radar returns” from 
both interceptor aircraft and target aircraft. The 
RCs vectored the interceptors to the targets via 
verbal commands over simulated radio channels to 
“pilots” (E£ assistants) in an adjacent room. The 
pilots entered heading and airspeed adjustments as 
directed on signal generator consoles, and these in 
turn effected movement of the interceptor radar 
returns on the CRT displays. The E observed this 
activity from an elevated room and he provided 
verbal feedback to the RC teams from this room 
within 15 seconds following each attempted intercep- 
tion. 

Pairs of targets entered the airspace on prepro- 
grammed, nonevasive straight-line courses at either 
300, 500, or 600 knots per hour. During training an 
RC handled two interceptors and attempted to place 
them on their respective targets simultaneously; thus, 
coordination was within RCs. During transfer each 
RC was assigned one of a pair of targets; thus, co- 
ordination was between RCs. The range of interceptor 
speeds was 200-1,200 knots. 

Design and procedure. A summary of the experi- 
mental design is provided in Table 1. In the training 
sessions a team in Group 4, say, was informed that 
both speed (time) and degree of coordination were 
important, but that they should emphasize coordina- 
tion at the expense of speed, if necessary. Upon 
transfer, the teams to experience a change were in- 
formed of this and either they were told to em- 
phasize the new criterion (for Groups 1 and 4), 
or they were instructed to give equal emphasis to 
speed and coordination (Groups 5-8). Groups 2 and 
3 were told to continue to emphasize the same aspect 
of the task as during the training sessions. 

The verbal feedback from E to the RCs, after each 
intercept attempt, was based on the criterion condi- 
tion appropriate to each team, of course. Thus, dur- 
ing training, teams in Groups 1, 2, 5, and 6 re- 
ceived information on elapsed time from the be- 
ginning of penetration by two targets to the moment 
of interception, such as: “Time, 150 seconds.” Co- 
ordination information was based on the average 
separation of the two targets from their interceptors 
at the time one target was successfully intercepted or 
passed out of the zone of control set up for the RCs. 
A successful intercept during training was achieved 
when an interceptor came within 2 nautical miles of 
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TABLE 1 
EXPERIMENTAL TREATMENTS 


Criterion 
Group 
Training Transfer 

1 Time Coordination (C) 
2 Time Time (T) 

3 Coordination Coordination (C) 
4 Coordination Time (T) 

5 Time C+T 

6 Time Coxe 

7 Coordination C+T 

8 Coordination Coxau 


its target, while during transfer a 1-nautical-mile 
separation was required. Therefore, during transfer 
perfect coordination required that both RCs have 
their interceptors on target simultaneously, and the 
degree of coordination (average separation) under 
this condition was 1 nautical mile; however, if RC: 
had his interceptor on target and RCs was still 5 
miles, say, from his target, then the degree of co- 
ordination was 3 nautical miles. 

An experimental assistant measured both elapsed 
time and average separation by “freezing” both targets 
and interceptors at the appropriate moment, measur- 
ing and recording these data and immediately re- 
porting the information to E who in turn fed back 
the appropriate score(s) to the RC team. The ex- 
perimental assistant also “restarted” the interceptors 
and reset the targets for another preprogrammed 
penetration by issuing instructions to the pilots over 
a voice channel other than those used by the RCs. 

During the first two training sessions the RCs were 
given task familiarization and limited practice at the 
displays. Training Sessions 3 and 4 involved full 35- 
minute periods at the task, as did all four transfer 
sessions. Thus, data are reported below for six 35- 
minute sessions. Complete data were obtained for 
eight two-man teams in each group. 

Performance measurement. As indicated above, de- 
gree of coordination was recorded in terms of average 
separation of targets and interceptors. The results 
were analyzed in terms of these separation scores. 
Also recorded was elapsed time for each intercept at- 
tempt. For purposes of analysis these data were con- 
verted to speed scores (intercepts per minute) to 
avoid the skewness often associated with latency 
data. 


RESULTS 


Training. Figures 1 and 2 provide a sum- 
mary of performance during training in terms 
of degree of coordination (separation) and 
time (speed or intercepts per minute), re- 
spectively. The data plotted are overall aver- 
ages for Groups 1, 2, 5, and 6 and for Groups 
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Fic. 1. Coordination performance during training. 


3, 4, 7, and 8 since these groupings experi- 
enced comparable treatments during training 
(see Table 1). Analyses of variance revealed 
significant differences between the functions 
in Figure 1, F (1, 62) = 99.60, p < .01), and 
meeneures 2,0F. (1,62) = 37.47, ~ <..01. 
Further, there were significant practice ef- 
fects in both sets of data (p < .01). 

It follows, therefore, that performance dur- 
ing training was consistent with expectations: 
[f instructed to emphasize coordination, the 
RCs generated better coordinated intercepts 
(smaller separation errors) than did RCs 
working under a time criterion (see Figure 1). 
As Figure 2 shows, however, this coordina- 
‘lon superiority was gained at the expense of 
temporal performance; fewer intercepts per 
minute were recorded for RCs who worked 
under the coordination criterion than for those 
teams which worked under a time criterion 
condition. A comparable set of observations 
nolds, of course, for teams instructed to em- 
phasize speed in performance. 

Transfer. Figures 3 and 4 provide sum- 
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Fic. 2. Speed performance during training. 
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Fic. 3. Coordination performance during transfer 
for Groups 1+4. 


maries of performance by Groups 1-4 during 
transfer, while Figures 5 and 6 summarize 
transfer performance by Groups 5-8. The 
former groups provide information on trans- 
fer to simple criteria, while the latter groups 
concern transfer to complex criterion condi- 
tions; therefore, these two sets of data will 
be considered separately. 

A summary of two analyses of variance for 
Groups 1-4 appears in the top half of Table 
2. It may be noted that the transfer-task 
criterion conditions (not the training criteria) 
had a significant effect on performance, both 
in terms of speed and degree of coordination. 
Thus, referring to Figures 3 and 4, Groups 1 
and 3 generated better coordinated intercepts 
than did Groups 2 and 4, but the latter car- 
ried out their intercepts with significantly 
greater speed, on the average. 

As with the training data, then, the RC 
teams performed in a manner consistent with 
the instructions and the E-generated feed- 
back: when encouraged to emphasize coordina- 
tion, this was done at the expense of speed, 
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Fic. 4. Speed performance during transfer for 
Groups 1+4. 
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TABLE 2 
ANALYSES OF VARIANCE FOR THE SPEED AND COORDINATION DATA DURING ‘TRANSFER 
Coordination Speed 
Source df ee 
MS PF MS PF 
Groups 1-4 
Training criterion (TC) 1 Poo — 0360. 1.76 
Transfer criterion (TrC) 1 376.065 64.79%** 4433 Dome 
LC alr 1 ahh -- .0209 1.02 
Teams within groups (T/G) 28 5.804 0204. 
Sessions (Ses) 3 4.449 1133 0562 22.48** 
Ses ene LG 3 1.001 — 0015 — 
Ses) x IrC 3 Drow _- 0136 5.44** 
es eel Ce cain ) 2.786 —_— 0038 152 
Ses X T/G 84 3.339 0025 
Groups 5-8 
Training criterion (TC) 1 66.268 8.06** 2314 15.64** 
Transfer criterion (TrC) i 21.206 2.58 0182 28 
DG Kale 1 487 -— 0081 —- 
Teams within groups (T/G) 28 8.222 0148 
Sessions (Ses) 3 7.205 So un 0517 15202" 
Ses X TC 3 i208 1.50 0003 — 
Sse ule 3 2.566 1.18 .0010 _ 
Ses X TC X TrC a .978 = 0048 1.41 
Ses X T/G 84. 2.168 0034 
*p <.05 a q 
KD <.01 


and vice versa. Further, except for the speed 
performance of Group 4 (see Figure 4) the 
adjustment to new criterion conditions seems 
to have been very rapid. The significant Trans- 
fer Criterion X Sessions interaction noted in 


Table 2 for the speed data is due primarily to not differ at p < .05. 
gradual adaptation of Group 4 to Turning to Groups 5-8, all of which trans- 


the more 


TABLE 3 


Resutts or A DuNCAN MULTIPLE-COMPARISON TEST APPLIED TO THE 
Sprep Data or Groups 1-4 at TRANSFER 








its new criterion condition in terms of the 
temporal aspect of performance. Table 3 pro- 
vides the results of the Duncan test which 
was applied to the speed data defining this 
interaction. Group means within brackets do 


Session 
1 Z 3 4 
Speed Group Speed Group Speed Group Speed 
278 2 373 2 379 2 390 
iS 4 274 4 Ozu 4, 375 
192 | 191 a .203 S 264 
184 3 187 3 198 1 241 


Note.—Group means within brackets do not differ at p < .05, 
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Fic. 5. Coordination performance during transfer 
for Groups 5-8. 


erred to complex criterion conditions, Figures 

and 6 and the lower half of Table 2 indi- 
ate a somewhat different pattern of perform- 
nce from that noted above for groups trans- 
srred to simple criterion conditions. Here 
1e training rather than the transfer criterion 
onditions exerted a significant effect on trans- 
r-task performance. Thus, Groups 7 and 8 
oth trained on the coordination criterion 
ondition and, on the average, continued to 
chieve better coordinated intercepts during 
‘ansfer than did Groups 5 and 6 (see Figure 
). However, the latter two groups, which 
‘ained under the time criterion condition, are 
iperior to Groups 7 and 8 in terms of speed 
f intercepts during transfer (see Figure 6). 


DiscuUSSION 


It is apparent that if transfer was to a 
mple criterion condition, teams adapted to 
ve new work situation rather quickly and 
vere was little residual effect of the previ- 
usly experienced criterion condition except 
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Fic. 6. Speed performance during transfer 
for Groups 5-8. 


471 


for the speed performance of Group 4. How- 
ever, upon transfer to complex criterion condi- 
tions, which included incompatible criteria, 
teams continued to emphasize that aspect of 
performance which was suitable under the 
previous, more simple criterion conditions. 
These results make sense intuitively: if one 
is clear in his instructions to staff and if 
these instructions do not include conflicting 
or incompatible requirements, then one ex- 
pects the staff to operate “as instructed” re- 
gardless of previous instructions. This cor- 
responds to the treatment of Groups 1-4. 
However, if, as was the case for Groups 5-8, 
the staff not only finds a change in the ground 
rules but also the new instructions are in- 
ternally incompatible, then the logical pro- 
cedure is to continue emphasizing that aspect 
of performance with which they have had 
more experience. Thus, the commonly ob- 
served emphasis that college students place 
on the accuracy component of a speed-ac- 
curacy criterion (Howell & Kreidler, 1963, 
1964) may be a result of accuracy “training” 
rather than of an inherent bias. 

It was expected that Groups 5-8 during 
transfer would attempt to achieve some kind 
of compromise between the two incompatible 
criteria. A comparison of Figures 5 and 6 
with Figures 3 and 4, respectively, suggests 
that such an accommodation occurred: across 
the four transfer sessions, Groups 1 and 3 
achieved a fairly large performance difference 
compared to Groups 2 and 4 on both de- 
pendent variables; however, the differences 
among Groups 5-8 were substantially less 
than those between Groups 1-4, and terminal 
performance by Groups 5-8 appears to be at 
an intermediate level compared to the more 
extreme performance levels obtained by 
Groups 1-4. 

Interestingly, it appears to make no dif- 
ference in transfer performance between teams 
which were given separate information on the 
temporal and coordinative aspects of perform- 
ance and teams which received only a com- 
bined figure-of-merit index of performance, 
that is, the average performance of Groups 5 
and 7 (separate feedback) did not differ sig- 
nificantly from the average of Groups 6 and 8 
(combined feedback) on either dependent 
variable during transfer. Since E, in effect, 
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assigned equal weight to the temporal and 
coordinative aspects of team performance in 
the derivation of the feedback scores for 
Groups 6 and 8, it would appear that the 
teams in Groups 5 and 7 also weighted these 
two aspects of performance equally. 

It is problematical if this same result would 
occur in situations requiring an unequal 
weighting for incompatible criteria; however, 
in the present experiment, personnel responded 
the same under a situation wherein the in- 
dividual teams were free to balance one cri- 
terion against another (Groups 5 and 7) as 
when “‘management”’ guided the balance more 
directly (Groups 6 and 8). 


GrorcE E. Briccs AND WILLIAM A. JOHNSTON 


REFERENCES 


Fitts, P. M. Cognitive aspects of information process- 
ing: III. Set for speed versus accuracy. Journal of 
Experimental Psychology, 1966, 71, 849-857. 

Howe tt, W. C., & Kremer, D. L. Information 
processing under contradictory instructional sets. 
Journal of Experimental Psychology, 1963, 65, 
39-46. 

Howe tt, W. C., & Kremter, D. L. Instructional sets 
and subjective criterion levels in a complex in- 
formation-processing task. Journal of Experimental 
Psychology, 1964, 68, 612-614. 

Jounston, W. A. Transfer of team skills as a func- 
tion of type of training. Journal of Applied Psy- 
chology, 1966, 50, 102-108. 


(Early publication received September 2, 1966) 


Journal of Applied Psychology 
1966, Vol. 50, No. 6, 473-478 


ROLE OF VERBAL COMMUNICATION IN TEAMWORK: 


ROBERT C. WILLIGES, WILLIAM A. JOHNSTON, anp GEORGE E. BRIGGS 


Ohio State University 


A simulated radar-controlled aerial intercept task was used to examine verbal 
communication between teammates under verbal (communication necessary) 
and verbal-visual (communication unnecessary) conditions. Communication 
facilitated team performance only in the verbal condition. Team perform- 
ance, however, was best in the verbal-visual condition. A transfer-of-training 
paradigm was employed to determine if verbal skills developed in 1 condition 
would transfer to the other condition. Differential transfer occurred neither in 
communication behavior nor in team performance. It was concluded that verbal 
communication, when not required by the task, plays an insignificant role in 
teamwork, and that this role apparently is not enhanced by verbal training. 


Briggs and Naylor (1965) observed a dis- 
ruptive effect of verbal communications be- 
tween teammates on team performance in a 
simulated aerial intercept task, and Johnston 
(1966) localized this effect to task-irrelevant 
and tactical communications. The present 
study was designed to further assess the role 
of communication in teamwork and to explore 
the possibility that communications can en- 
hance performance after appropriate training. 

In neither of the previous studies was com- 
munication between teammates necessary to 
complete the task. In the Johnston experi- 
ment, for example, a teammate could obtain 
pertinent information concerning his partner’s 
activity from a visual channel (his radar dis- 
play) as well as from the less efficient verbal 
channel. Apparently, then, when a more ef- 
ficient channel is available, the verbal channel 
is used unnecessarily and fosters the develop- 
ment of poor communication habits. On the 
other hand, good communication-habits might 
be acquired if an alternate channel is not 
available and communication is therefore es- 
sential for successful team performance. Thus, 
it was assumed that good and poor verbal 
habits are formed in verbal and verbal-visual 
conditions, respectively. The present study 


1 This research was supported by the United States 
Navy under Contract No. N61339-1327, sponsored by 
the United States Naval Training Device Center, Or- 
lando, Florida. Permission is granted for reproduc- 
tion of this publication in whole or in part for any 
purpose of the United States Government. An ex- 
panded version of this study was submitted by R. C. 
Williges as partial fulfillment of the requirements for 
a master’s degree at Ohio State University. 


employed a transfer-of-training paradigm to 
determine if verbal skills acquired under one 
condition would transfer to the other condi- 
tion. Of specific interest was the possibility 
that communications can augment teamwork 
in the verbal-visual condition provided that 
teammates are trained in the verbal condi- 
tion. Of course, by providing ready visual 
access to critical information, the verbal- 
visual condition should be superior to the 
verbal system in terms of team performance. 
However, the development of verbal habits 
might cause team performance at transfer 
to be better after verbal training than after 
verbal-visual training. Thus, while the verbal 
condition is a poorly designed system which 
should be avoided in operational circum- 
stances, it may nevertheless serve a useful 
training function. 


METHOD 


Subjects and apparatus. The subjects (Ss) were 64 
undergraduate males who were naive to the experi- 
mental task. They served in two-man teams for eight 
35-minute sessions, and each S received $10.00 for 
his services. The first four sessions defined training 
and the last four sessions defined transfer tasks. 

A simulated aerial intercept task was used for both 
training and transfer, and the two teammates served 
as radar controllers (RCs) who were required to 
guide interceptor aircraft to their targets. The ap- 
paratus used to implement this task has been de- 
scribed in detail elsewhere (Hixson, Harter, Warren, 
& Cowan, 1954). Briefly, several 5-inch (training) 
and 14-inch (transfer) cathode-ray tube (CRT) dis- 
plays, the diameter of each representing 200 miles of 
airspace, were linked to aircraft-generator consoles 
through a special purpose analog computer. Each 
display was divided into four quadrants, each 
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quadrant representing the airspace guarded by a 
single interceptor aircraft. The aircraft (radar re- 
turns) appeared in clock code (Briggs & Naylor, 
1964) at a simulated altitude of 35,000 feet; they 
moved in real time and turned at realistic rates. 

Design and procedure. The first two training ses- 
sions were devoted to general instructions concerning 
the task and operation of the equipment, and the 
final 2 training days consisted of two 35-minute 
sessions on the aerial intercept task. During transfer, 
each team completed four 35-minute sessions on the 
same task, Each RC had charge of two display 
quadrants while his partner controlled the other two 
quadrants. When a target entered a quadrant, the RC 
issued heading and speed commands to the ap- 
propriate pilot (experimenter assistants) in an effort 
to effect a hit, that is, to vector the interceptor 
within 1 (transfer) or 2 (training) miles of the 
target. A miss occurred if a target traversed the 
quadrant without being intercepted. The pilots fol- 
lowed RC instructions exactly by appropriate manip- 
ulations of the aircraft-generator consoles. The targets 
entered the airspace at various points with variable 
headings and speeds but never took evasive action, 
and it always was possible to intercept a target 
before it left the quadrant. 

A slightly adapted version of the team-coordination 
task described by Johnston (1966) was employed. 
The major feature of this task is that the two team- 
mates are required to obtain stmultaneous intercepts. 
Thus, two targets enter adjacent quadrants on sym- 
metrical flight paths and traveling at identical air- 
speeds. One target is to be intercepted by RC: and 
the other is to be intercepted by RCs. When an 
intercept occurs, the two interceptors and their 
targets are “frozen” on the scope. The degree of 
team coordination is then measured as the distance 
(in miles) separating the nonintercepted target and 
its interceptor. This index of teamwork is fed back 
immediately to the RC team. Following each run, 
the targets are taken off the scopes and replaced by 
new ones in different quadrants. 

The experimental design is summarized in Table 1. 
In the verbal-visual condition, an RC could obtain 
information about his partner’s status from a verbal 
communication link between them or by visual in- 
spection of the radar display. Thus, verbal com- 
munication was unnecessary in this condition be- 
cause the pertinent coordination information could 
be obtained merely by viewing the visual display. In 
the verbal condition, on the other hand, verbal com- 


TABLE 1 
SUMMARY OF EXPERIMENTAL DESIGN 








Group Training condition | Transfer condition 
1 Verbal-visual Verbal-visual 
2 Verbal Verbal-visual 
2 Verbal-visual Verbal 
4 Verbal Verbal 
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TABLE 2 


TEAM COMMUNICATION CATEGORIES 


Category Description 

Self-identification protocols re- 
quired of callers and receivers 
at the beginning of each con- 
versation, e.g., “RCi to RC2.” 

Communications conveying in- 
formation not pertaining to 
the basic task, e.g., “Do we 
get paid tonight?’’ 

Communications conveying in- 
formation redundant with 
display information and orig- 
inally obtainable only by 
viewing the display, e.g., 
“Your interceptor is going 
out of the quadrant.” 

Communications conveying 
task-relevant information 
not directly obtainable from 
the displays, e.g., ‘I just in- 
creased my interceptor speed 
to 1,200 knots.” 

Tactical communications in 
which a request for action-is 
issued by one RC to his 
partner, e.g., “Speed up 
Alpha 2, immediately.” 


Identification 


Task irrelevant 


Declarative statement 


Tactical statement 


Tactical command 


munication was necessary because an RC could not 
see his partner’s airspace on the display; instead, 
he was allowed visual access only to his own air- 
space. Four new teams were trained each week for 
8 successive weeks. Teams were assigned to groups 
on a random basis with the restriction that there be 
one team per group each week. 

Communication analysis. Inter-RC communica- 
tions were tape-recorded and later analyzed by a 48- 
item classificatory scheme which is described in detail 
elsewhere (Briggs & Johnston, 1966). This scheme 
permitted unique categorization of each unit of con- 
versation. Two well-trained judges conducted the 
content analyses with intra- and interjudge agree- 
ment averaging r= .99 and r=.92, respectively. In 
addition, an Esterline-Angus pen recorder permitted 
each communication to be classified as occurring in 
the initial, middle, or final third of an intercept run. 

For analytical expediency, the original 48-item 
scheme was distilled to the five communication cate- 
gories defined in Table 2. Tactical and declarative 
communications pertained to the task and there- 
fore were of major interest. Declarative communica- 
tions usually were descriptions of some or all aspects 
of the configuration of aircraft in the total airspace 
being observed. Tactical communications, on the 
other hand, generally related to what an RC had 
done, was doing, or intended to do in response to the 
configuration of aircraft. Since each teammate could 
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see only half of the total configuration of aircraft on 
his scope, the verbal condition was expected to pro- 
mote declarative communications, The relative fre- 
quencies of the five types of communication were 
computed for each team on the last two training 
sessions and all four transfer sessions so that com- 
munication profiles could be determined for each 
condition. 


RESULTS 
Team Coordination 


An analysis of variance of the training data 
(mean team-coordination scores) revealed that, 
as expected, team coordination was better in 
the verbal-visual condition than in the verbal 
condition, F (1, 30) = 11.01, p< .01, and 
performance was better on Training Session 4 
than on Session 3, F (1, 30) = 10.05, p< 
01. The Session * Training Condition inter- 
action was not significant, F (1, 30) < 1.00. 
An analysis of variance of the transfer data 
showed corresponding effects. That is, per- 
formance improved across transfer sessions, 
F (3, 84) = 8.88, p < .001, and it was better 
in the verbal-visual than in the verbal condi- 
tion, F (1, 28) = 24.02, p< .001. None of 
the remaining sources of variance was sig- 
nificant (p > .05). Contrary to expectations, 
therefore, verbal training was not superior to 
verbal-visual training in terms of teamwork 
during transfer, F (1, 28) < 1.00. Neverthe- 
less, it is possible that the presumed differences 
between the four experimental groups in terms 
of verbal communication did occur, but that 
these differences were not of sufficient magni- 
tude to produce differences in team coordina- 
tion. The communication records were ex- 
amined in an effort to explore this possibility. 


Team Communication 


Condition-communication relationships. A 
detailed treatment of the temporal aspects of 
communication is given elsewhere (Williges, 
1966). Suffice to say here that communication 
frequency tended to increase across run 
thirds, particularly for tactical commands in 
the verbal-visual condition. 

Table 3 summarizes the results of an analy- 
sis of variance of the overall frequencies of 
communication on Training Sessions 3 and 4. 
Duncan’s test (p < .05) was used to specify 
the bases of the significant effects. This test 
disclosed that task-irrelevant communications 
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TABLE 3 


ANALYSIS OF VARIANCE OF COMMUNICATION FREQUENCY 
DURING THE LAST 2 TRAINING SESSIONS 








Source df MS F 
Between subjects 31 
Training condition (TC) 1) 3887.41] 1.01 
Teams within groups (T/G) | 30| 3847.42 
Within subjects 128 
Communication category (C)| 4 | 24579.39 | 53.34* 
(OSANE 4 | 10397.27 | 22.56* 
COT /G 120 460.79 


*p < 001, 


and tactical commands were less frequent 
than the remaining categories and that the 
verbal condition fostered declarative com- 
munications while the verbal-visual condition 
fostered tactical communications. Thus, though 
training conditions did not differ in total fre- 
quency of communication, they did differ in 
communication profile. This difference in com- 
munication profile is evident in Figure 1. 
The question now becomes: Did the com- 
munication profile established under one con- 
dition (verbal) transfer to the other condition 
(verbal-visual) and vice versa? This question 
was answered by an analysis of variance ap- 
plied to the communication frequency data 
for the first session of transfer, the session in 
which transfer effects should be most evident 
if they occur at all. However, both the train- 
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Fic. 1. Relative frequency of team communication 
categories during training as a function of verbal and 
verbal-visual training. 


476 


ing condition main effect, F (1, 28) < 1.00, 
and the Training Condition X Transfer Con- 
dition interaction, F (1, 28) < 1.00, failed 
to attain statistical significance. Thus, the 
expected differential transfer of training did 
not occur. Instead, communication at transfer 
fell under immediate control of the transfer 
conditions; F “(l, 28) —=7.93,.):—).0193Bur- 
thermore, as Figure 2 shows, the verbal and 
the verbal-visual conditions at transfer fostered 
different communication profiles, F (4, 112) 
= 28.47, p< .001. These profiles bear re- 
markable resemblance to those established 
under corresponding conditions during train- 
ing. That is, declarative communications were 
emphasized in the verbal condition, and tacti- 
cal communications were promoted in the 
verbal-visual condition. Contrary to the train- 
ing data, however, the total frequency of com- 
munication on Session 1 of transfer was 
greater in the verbal condition (170 per team) 
than in the verbal-visual condition (66 per 
team), F (4, 112) = 46.01, p < .001. Never- 
theless, it is clear that while the two condi- 
tions affect communication performance, they 
do not produce the more permanent learning 
effects that would lead to differential transfer 
of training. 

Communication-coordination relationships. 
Product-moment correlational analyses of the 
training data were conducted to ascertain the 
effects of communication on team coordina- 
tion in the verbal and the verbal-visual con- 
ditions. In the verbal condition, significant 
positive correlations were found between team 
performance and total frequency of communi- 
cation overall, 7 (14) = .54, p < .05, in the 
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Fic. 2. Relative frequency of team communication 
categories during the first transfer session as a func- 
tion of verbal and verbal-visual training. 


middle run third, r (14) = .51, 6 < .05, and 
in the final run third, 7 (14) = .61, p < .05. 
The communication category loci of these 
positive relationships are shown in Table 4. 
Each coefficient expresses the relationship be- 
tween team coordination and the relative fre- 
quency of occurrence of a particular type of 
communication, overall or in a given run 
third. It is obvious that a facilitative effect of 
communication tended to be associated pri- 
marily with tactical communications and that 
performance was reduced by identifications. 
A high proportion of identifications signifies 
short conversations, that is, minimal informa- 
tion per conversation. It appears, then, that 
the facilitative effect of communication in the 
verbal condition is augmented when team- 
mates transmit a considerable amount of 


TABLE 4 


RELATIONSHIPS BETWEEN TEAM COORDINATION DURING TRAINING IN THE VERBAL CONDITION 
AND RELATIVE FREQUENCY OF EACH COMMUNICATION CATEGORY 








bei Overall TniGal third Middle third Finale 
Identification — 45 — .68** — .59* 39 
Task irrelevant eal = e7, —.10 30 
Declarative statement 43 noo) ih 53" 
Tactical statement — Al ie —.24 6554 
Tactical command AT 37 58* 28 





N over egative coefficients indicate inhibitory effects and positive coefficients indicate facilitative effects (df = 14). 
p OS. 


ED <.01. 
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tactical information in relatively few conver- 
sations. 

Corresponding correlational analyses ap- 
plied to the verbal-visual condition revealed 
no significant correlations. Thus, communica- 
tion facilitated training performance in the 
verbal condition, as predicted, but it did not 
produce the expected inhibition of perform- 
ance in the verbal-visual condition. Correla- 
tional analyses were also performed on the 
transfer data. There were consistent facilita- 
tive effects of communication in Group 4 
(verbal to verbal), but there were no con- 
sistent effects of communication in any of the 
other groups. The facilitative effects observed 
in Group 4 were again localized to tactical 
communications. 


DiIscussION 


As expected, teamwork was better in the 
verbal-visual condition than in the verbal con- 


dition. Since Group 1 continued to surpass ~ 


Group 4 over the full extent of training and 
transfer, it is clear that even relatively pro- 
longed experience in the verbal condition does 
not ameliorate its inferiority to the verbal- 
visual condition. Contrary to prediction, how- 
ever, differential transfer of training did not 
occur. Specifically, performance in a given 
condition at transfer was not augmented by 
verbal training. This called for a reexamina- 
tion of the underlying assumptions from 
which the prediction was developed. 
Essentially, the prediction stemmed from 
the assumptions that good and poor communi- 
cation habits would be developed in the verbal 
and verbal-visual conditions, respectively, and 
that habits acquired in one condition would 
transfer to the other condition, thereby af- 
fecting transfer performance. As Figure 1 
shows, the two conditions in fact did promote 
different communication profiles. Thus, de- 
clarative communications were dominant in 
the verbal condition, and tactical communica- 
tions were prevalent in the verbal-visual condi- 
tion. However, Figure 2 shows that the com- 
munication profile established in one condi- 
tion did not transfer to the other condition. 
Thus, the two conditions seem to have rather 
consistent effects on performance, but no ob- 
servable effects on /earning. One reason, then, 
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that transfer of training did not occur in 
terms of teamwork is that it did not occur in 
terms of the presumed mediator of teamwork, 
namely, communication behavior. 

Turning now to the effect of communica- 
tion on team coordination during training, it 
is noteworthy that though communication did 
not disrupt team performance in the verbal- 
visual condition, it did facilitate team per- 
formance in the verbal condition. Paradoxi- 
cally, however, this facilitative effect was 
localized primarily to communications that 
were characteristic of the verbal-visual condi- 
tion (tactical communications), rather than 
to those characteristic of the verbal condition 
(declarative communications). Furthermore, 
while tactical communications may facilitate 
performance in the verbal condition, they 
previously have been found to retard team- 
work in the verbal-visual condition (Johnston, 
1966). Clearly, therefore, even if communica- 
tion behavior had transferred across condi- 
tions, the resulting effects on teamwork would 
not have been as originally predicted. Taking 
all of these considerations into account, it is 
understandable that teamwork in a given con- 
dition at transfer was not augmented by verbal 
training. 

What, then, is the role of communication in 
teamwork? The present data suggest that com- 
munication facilitates performance only when 
a more efficient information channel is not 
available. However, such a circumstance de- 
notes a poorly designed system which is likely 
to arise in real team tasks only by accident, 
for example, by the failure of alternate in- 
formation channels. Under the more realistic 
circumstances represented by the verbal-visual 
condition, communication has been found to 
have either no effect (Jensen, 1962; Kinkade 
& Kidd, 1959), or only a slight and usually 
disruptive effect on team performance (Johns- 
ton, 1966). This is somewhat startling in view 
of the ubiquity of, and importance often at- 
tributed to, communication in team tasks 
(Bales, 1950; Freed, 1962). Consequently, 
when it is permitted but not demanded by 
the task, interoperator verbal communication 
appears to be little more than an unnecessary 
and rather tempting luxury that has relatively 
little impact on teamwork. 
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SELF-ESTEEM VARIABLE IN VOCATIONAL CHOICE?! 


ABRAHAM K. KORMAN 
New York University 


Report of 2 studies designed to test predictions from the hypothesis that 
individuals of high self-esteem tend to implement self when making an 
occupational choice whereas individuals of low self-esteem do not. 14 specific 
predictions were made and supported from this general hypothesis. Implica- 


tions for ability, self-evaluation, 


suggested. 


Current vocational choice theory has as 
its general framework the supposition that 
the choosing of an occupation should be 
viewed within context of the general per- 
sonality development of the individual as he 
comes to view himself and the world around 
him (Holland, 1963; Siegelman & Peck, 
1960; Super, 1953). More particularly, it 
postulates that the choosing of a certain set 
of social roles, such as that involved in voca- 
tional choice, and the rejecting of others is 
dependent on the characteristics which one 
attributes to oneself, on either a conscious 
or unconscious level, and the characteristics 
which are attributed to performance in the 
various social roles. The choice is then made 
on the basis of the extent to which an indi- 
vidual “sees himself in the role” or the role 
as befitting himself. 

Despite the support which has been pro- 
vided this theory in a number of studies 
(Englander, 1960; Segal, 1961; Siegelman & 
Peck, 1960), many writers have pointed out 
that occupations are chosen on other bases 
besides that of implementation of a “self- 
concept,” with these other factors quite 
often working against self-implementation. As 
Paterson (1962), among others, has said, 
both the level and direction of vocational 
aspiration may, to a great extent, be deter- 
mined by the hopes and aspirations of par- 
ents, wives, and friends, with these percep- 
tions and motives of others frequently at 
variance with those of the individual making 
the vocational choice. 

The purpose of this paper is to report two 
studies designed to test several predictions 


1The research reported in this paper was par- 
tially supported by the Office of Scientific and 
Scholarly Research of the University of Oregon. 


and 


successful role performance were 


relevant to the above question. These hy- 
potheses are derived from a “balance- 
theoretical” framework, and are, in addition, 
relevant to the “moderator variable” concept 
discussed by Saunders (1956) and Ghiselli 
(1963a), among others. The theoretical ap- 
proach proposed stems from the assumption 
that: 


All other things being equal, individuals will engage 
in those behavioral roles which will maximize their 
sense of cognitive balance or consistency. 


If we then define self-esteem, following 
Gelfand (1962), as: 


A person’s characteristic evaluation of himself and 
what he thinks of himself as an individual; low 
self-esteem is characterized by a sense of personal 
inadequacy and an inability to achieve need satis- 
faction in the past; high self-esteem is defined by 
a sense of personal adequacy and a sense of having 
achieved need satisfaction in the past; 


Then the following general hypothesis seems 
to be a logical one: 


Individuals high in self-esteem are likely to choose 
those occupations which they perceive to be most 
likely to fulfill their specific needs and to be in 
keeping with their self-perceived characteristics. Such 
a choice would be in balance with their cognition of 
themselves as need-satisfying individuals, and they 
are, thus more likely to reject those influences, social 
or otherwise, which might minimize the achievement 
of such balance. 


Individuals low in self-esteem are less likely to 
choose those occupations which they perceive to be 
most likely to fulfill their specific needs and to be 
in keeping with their self-perceived characteristics. 
Such a choice of a “nonself-appropriate” role would 
be more in keeping with their cognition of them- 
selves as nonneed-satisfying individuals, and they 
would then be more likely to accept those influ- 
ences, social or otherwise, which would maximize 
the probability of their entering an occupation 
which they would perceive as “nonself-appropriate.” 
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In essence, these two hypotheses propose 
that an individual’s self-esteem acts as a 
“moderator variable” on the extent to which 
his self-perceived needs are predictive of his 
occupational choice. For those high on self- 
esteem, the prediction is that such self- 
perceptions are highly predictive of eventual 
occupational choice, whereas, according to the 
hypothesis, such predictions break down for 
those low on this variable. 

The research to be reported here derives 
from these considerations. 


Stupy I 


There is now clear evidence that occupa- 
tions are perceived as calling for distinct 
behavioral and attitudinal patterns, and that 
such perceptions are invariant phenomena 
across different segments of some college- 
student populations (O’Dowd & Beardslee, 
1960). As a result, from these studies of 
occupational stereotypes, one is provided with 
a very clear rationale for predicting that 
people with different self-perceived charac- 
teristics would be inclined to enter dif- 
ferent occupations, as long as that occupation 
was being chosen on the basis of “self- 
implementation.” To the extent, however, 
that choices were not being made on this 
basis, then there would be no reason to pre- 
dict differences between people choosing dif- 
ferent occupations from a knowledge of 
occupational stereotypes. Accordingly, the 
following specific predictions were made: 


Hypothesis 1. Since various occupations 
are thought of and perceived as differing in 
the extent to which they require interaction 
with other people and the extent to which 
social capability is required, those individuals 
choosing occupations which require a great 
deal of this behavior who have high self- 
esteem should have a greater degree of 
interaction-orientation than those low in self- 
esteem. The reverse relationship between 
self-esteem and interaction-orientation should 
occur for those choosing occupations which 
are perceived to require a small amount of 
interaction with others. These relationships, 
then, should result in the following: 


Hypothesis 1A. For individuals high in self- 
esteem, those who have chosen a sales career 
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should have a greater degree of “interaction- 
orientation” than those choosing an account- 
ing career; 

Hypothesis 1B. For individuals high in self- 
esteem, those who have chosen a sales career 
should have a greater degree of “interaction- 
orientation” than those choosing a career in 
production management; 

Hypothesis 1C. For individuals low in self- 
esteem, there should be no difference in 
“interaction-orientation” between those choos- 
ing sales careers and those choosing account- 
ing careers; 

Hypothesis 1D. For individuals low in self- 
esteem, there should be no differences in 
‘“interaction-orientation” between those choos- 
ing sales careers and those choosing careers 
in production management. 


Hypothesis 2. It is quite clear that the 
salesman and the accountant play different 
roles in the world of work and are perceived 
as such (O’Dowd & Beardslee, 1960), with a 
major difference being in the degree of 
“structure” required in the work role. For 
example, the salesman can be conceived of as 
being in a dynamic ever-changing interaction 
with the customer where constraints are few 
and where a great premium is placed on 
being able to strike off in new directions and 
taking the initiative in such interchange. On 
the other hand, for the accountant, “regu- 
larity” and “structure” seem to be the key- 
note. He has a well-defined job with a given 
set of duties and responsibilities which are 
relatively routinized in nature. The “flux” of 
the sales situation is lacking. 

Thus, the following predictions appear to 
be reasonable: 


Hypothesis 2A. For individuals high in 
self-esteem, those choosing sales occupations 
will tend to perceive themselves in a manner 
which is descriptive of the individual with 
a high degree of “Initiative” while those 
entering the accounting field will tend to 
perceive themselves in a manner which is 
descriptive of the individual who is low on 
“Tnitiative”’ ; 

Hypothesis 2B. For individuals low in 
self-esteem, there will be no differences be- 
tween those choosing sales occupations and 
those choosing the accounting field insofar 
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as their self-perceptions are descriptive of 
degree of “TInitiative’’; 

Hypothesis 2C. For individuals high in 
self-esteem, those choosing sales occupations 
will describe themselves as having a greater 
need for “Job Freedom” than those choosing 
the accounting area; 

Hypothesis 2D. For individuals. low in 
self-esteem, there will be no differences be- 
tween those choosing sales occupations and 
those choosing accounting in terms of their 
self-perceived need for “Job Freedom.” 

No hypotheses were offered for the 
production-manager samples in relation to 
“Job Freedom” and “Initiative” since these 
variables do not seem to be dominant parts 
of the production-manager stereotype. 


MeEtTHOD 


Subjects. Since the samples of each hypothesis 
varied, the subjects (Ss) will be described sepa- 
rately below. In all cases, however, the samples 
consisted of male juniors and seniors taken from two 
upper-division schools of business administration in 
two large state universities in a far western state. 
Since there was no logical reason to separate the 
two, and since preliminary tests indicated no em- 
pirical basis, the data were combined from the two 
schools. Preliminary tests also indicated that there 
were no differences between any of the groups to 
be analyzed in grades or proportion of seniors as 
opposed to juniors. 

By this type of sample limitation, several impor- 
tant variables could be controlled for and thus 
make the occupational choice measurement (to be 
described below) a highly meaningful one. The 
reason for this is that this type of sample limits 
the analysis to those individuals who have, for the 
most part, both the intellectual and financial re- 
sources to enable them to enter the occupation of 
their choice and who have also gone through the 
career and “major sampling” activities of the first 
2 years of college life. On the other hand, the influ- 
ence of occupational role performance is, of course, 
usually not present since they are still only college 
students. Finally, there is good evidence that 
choices at this state are highly predictive of later 
occupational membership (Schletzer, 1963). 

The samples for the separate hypotheses are given 
in Table 1. In general, the procedure used for each 
hypothesis was to split each “occupational” group 
sample so that the “high self-esteem” group was 
defined by approximately the top one third of 
scores on “self-esteem” while the bottom two thirds 
defined the “low self-esteem group.” In one sample, 
the pattern of scores seemed to justify a split 
somewhat closer to the median. 

All measuring instruments were administered 
during the course of normal class-sessions, and there 
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TABLE 1 


SAMPLES FOR THE SEPARATE HYPOTHESES 














Hy- 
poth- | Size Type Occupational group 
esis 
1A 10 | High Self-Esteem | Sales 
1A 15 | High Self-Esteem | Accounting 
1B 10 | High Self-Esteem | Sales 
1B 13 | High Self-Esteem | Production managers 
ue 23 | Low Self-Esteem | Sales 
ne 28 | Low Self-Esteem | Accounting 
1D 23 | Low Self-Esteem | Sales 
1D 15 | Low Self-Esteem | Production managers 
2A 20 | High Self-Esteem | Sales 
2A 19 | High Self-Esteem | Accounting 
2B 31 | Low Self-Esteem | Sales 
2B 31 | Low Self-Esteem | Accounting 
IKE 9 | High Self-Esteem | Sales 
AG 13 | High Self-Esteem | Accounting 
2D 21 | Low Self-Esteem | Sales 
2D 20 | Low Self-Esteem | Accounting 





is no reason to think that full cooperation was not 
obtained. 

Measuring Instruments. 1. The occupational choice 
of the student was determined by means of a 3-part 
questionnaire. In the first two parts of the form, 
the student was asked first if he was interested in 
the business world as a career, and, second, if the 
answer was “Yes,” whether he had also decided on 
a business specialty. The third part of the question- 
naire asked if the individual had received any coun- 
seling in the vocational area from a university or 
college counseling center. This last part was used as 
a “control,” and all individuals that had received 
such assistance were eliminated from the analysis. 

The concurrent validity of the questionnaire was 
checked in two separate ways. Of 29 Ss indicating 
a specific business specialty on their registration 
forms (they do not have to indicate any), 28 indi- 
cated the same area on the questionnaire. Second, 
a comparison between questionnaire response and 
stated departmental major indicated a similar degree 
of correspondence (44 of 45). 

The test-retest reliability of the classification 
system was checked by administering the question- 
naire to students twice over a 6-week period. Of 
the 47 respondents, 96% were classified in the same 
manner for the two administrations. 

2. “Self-Esteem” was measured by the self- 
assurance scale of the Gbhiselli Self-Description 
Inventory.? This 31-item forced-choice adjective-pair 
scale is described by Ghiselli as measuring 


the extent to which the individual perceives him- 
self as being effective in dealing with the problems 
that confront him. There are those persons who 


2The author wishes to thank Edwin E, Ghiselli 
for granting permission to use the Self-Description 
Inventory, undated. 
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see themselves being sound in judgment and able 
to cope with almost any situation, whereas others 
think of themselves as being slow to grasp things, 
making many mistakes, and being generally inept 
[p. 9]. 


“Initiative” was measured by the scale of the 
same name in the Ghiselli Self-Description Inven- 
tory. This scale consists of 17 forced-choice adjec- 
tive pairs, with a high score indicating a person 
who is an inaugurator or originator who opens 
new fields and conceives of novel ways of doing 
things. 

Evidence for the construct validity of these scales 
is available in Ghiselli (see Footnote 2, 1963b) 8 

3. “Interaction-Orientation” was measured by the 
scale of the same name in the Bass Orientation 
Inventory (Bass, 1962). Evidence for the construct 
validity of this scale, which consists of 27 forced- 
choice triads, is provided in Bass. 

4. “Need for Job-Freedom” was measured by the 
scale of the same name in the Crites Vocation 
Reaction Survey.4 Evidence for the construct valid- 
ity of this scale, which consists of 10 Likert-type 
statements, is provided in Crites (1963). 


RESULTS 


Hypotheses 1A, 1B, 1C, 1D. The results 
for these hypotheses are summarized in 


3A minor problem which develops here is that 
the “Initiative” scale has an 8-item overlap with 
the Self-Assurance Scale (6 items scored in the 
same direction, and 2 opposite), thus tending to 
produce positive correlations between these scales. 
While these items could have been eliminated from 
the scoring key (and thus effectively partialed out), 
this would have resulted in the disadvantage of 
reducing the reliability of the scales. The use of 
the analysis of covariance as a method of correction 
was also rejected after an examination of the regres- 
sion lines indicated the hypothesis of equality could 
not be accepted. Hence, the procedure that was 
followed was to score the data in a raw-score 
fashion, thus making it unlikely that there would 
be a negative correlation between Initiative and 
Self-Esteem for the Accountant group, such as was 
predicted for this group between (a) Interaction- 
Orientation and Self-Esteem and (b) Need for Job- 
Freedom and Self-Esteem. To the extent, however, 
that among the Accountant group there will be no 
difference on “Initiative” between those high and 
low in Self-Esteem, this will support the author’s 
hypothesis since this will indicate that proportion- 
ately more low “initiative” people fall into the high 
Self-Esteem group than the low Self-Esteem group. 

Analogous reasoning would hold in the case of 
those occupations where the correlation between 
initiative and self-esteem was predicted to be 
positive. 

*The author wishes to thank John Crites for 
granting permission to use the Vocation Reaction 
Survey. 
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TABLE 2 


SELF-PERCEIVED INTERACTION-ORIENTATION AS A 
FUNCTION OF OCCUPATIONAL CHOICE 
AND SELF-ESTEEM 














Group N+ | SMES D t 

Hypothesis 1A 

High accountants 15 | 19.9 | 6.6 1.75% 

High sales 10) | 24575 |eie2 : 
Hypothesis 1B 

High production managers | 13 | 20.1 | 6.2 1.68* 

High sales 10 | 24.7 | 7.2 7 
Hypothesis 1C 

Low accountants 28 | 21.8} 7.6 51 

Low sales Z3 =|" 220 /s heal , 
Hypothesis 1D 

Low production managers | 15 | 23.7 | 4.8 73 

Low sales 23 | 22. (ieee : 

*p <.05. 


Table 2. Since all research hypotheses are 
specifically directional in nature, one-tail sig- 
nificance tests are utilized and so reported in 
both this and the following sections. 

These results indicate, in brief, that all 
hypotheses were supported in that differences 
in self-perceived personality characteristics 
occur only in the high self-esteem groups but 
do not occur for the low self-esteem groups. 

Hypotheses 2A, 2B. The results for these 
hypotheses are summarized in Table 3. They 
point to a similar conclusion to that of the 
previous, in that differences in self-perceived 
personality characteristics as a function of 
occupational choice predicted from occupa- 
tional stereotypes occur only for high self- 
esteem individuals. On the other hand, such 
a prediction is poor for the low self-esteem 
individuals. 


TABLE 3 


SELF-PERCEIVED INITIATIVE AS A FUNCTION OF 
SELF-ESTEEM AND OCCUPATIONAL CHOICE 








Group N M SD t 





Hypothesis 2A 


High sales 20. | 35.7 5.4 2 AL* 

High accounting 19 31.6 Set : 
Hypothesis 2B 

Low sales 31 26.2 6.4 2.89 

Low accounting 31 30.4 4.8 ; 





*> <.01. 
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TABLE 4 


SELF-PERCEIVED NEED FOR JOB FREEDOM AS A 
FUNCTION OF SELF-ESTEEM AND 
OCCUPATIONAL CHOICE 








Group N M SD t 





Hypothesis 2C 


High sales 9 11.78 3.45 2 34* 

High accountants| 13 8.0 4.04 : 
Hypothesis 2D 

Low sales 21 deca Gro2 1.00 

Low accountants | 20 9.05 4.32 i 


pe ,01. 


Hypotheses 2C, 2D. The results for these 
hypotheses are given in Table 4 and simi- 
lar conclusions are warranted. Occupational 
stereotypes and _ self-perceived personality 
characteristics of those individuals choosing 
the occupation are highly related, but only 
for high self-esteem individuals. Such rela- 
tionships do not seem to occur for those low 
in self-esteem. 


Stupy II 


_ Although the results from the first study 
were highly consistent with the theoretical 
framework proposed, the possibility remains 
that the obtained results were due not to the 
hypothesized process but perhaps could be 
explained more parsimoniously by postulating 
' different occupational perceptions by the indi- 
viduals involved. That is, it could be pro- 
posed that low self-esteem people differ some- 
what from high self-esteem people in their 
perceptions of occupations and this would 
explain the results obtained. 

While such an explanation did not seem 
likely because of the high invariancy of 
occupational perceptions among college stu- 
dents, it was felt that such a possiblity should 
at least be explored, and controlled more 
directly rather than through the assumption 
of the invariancy of occupational stereo- 
types; thus, a second study was undertaken 
which would allow a more direct control of 
the match between occupational and self- 
perception. Hence, if the discrepancy be- 
tween “desired” and “expected” continued 
to be greater for the low self-esteem than 
for the high self-esteem for relevant needs, 
the results of the first study will receive even 
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more substantial support since the measure 
of “expected need satisfactions” is more 
unique to the individual in this case, and is 
not a common stereotype to which all may 
not subscribe. 


The procedure that was followed, as a 
result, in this second study was to ask each 
individual who had made an occupational 
choice, using the same criterion as in the 
first study, to: 


1. Rate the importance of each of several 
needs to himself; 

2. Rate the probability of his chosen occu- 
pation being able to satisfy each of these 
same needs. 


It was predicted that these ratings would 
show the following characteristics: 


Hypothesis 1. For highly important needs, 
the probability that the chosen occupation 
would satisfy these needs would be greater 
for those with high self-esteem than for those 
with low self-esteem. 

Hypothesis 2. For unimportant needs, the 
probability that the chosen occupation would 
satisfy these needs would be the same for 
those with high self-esteem and those with 
low self-esteem. 


Hypothesis 1 is designed as a direct test 
of the extent to which self-expressed high 
needs are predictive of occupational choice 
for high self-esteem individuals, but not 
those of low self-esteem. Hypothesis 2 pro- 
vides a control on the results of the first 
prediction in that, if upheld, it would indicate 
that- the high self-esteem person sees the 
future role not just in terms of a “set” to 
see everything as more satisfying, but rather 
as more satisfying just in relation to oneself 
and one’s own self-perceived needs. 


METHOD 


Subjects. Three separate samples, independent of 
those from the first study, and independent from 
each other, were utilized in this analysis. Sample 1 
consisted of 37 students in a school of business 
administration at a far western state university. 
Sample 2 consisted of 39 lower-division students 
at a state college in a far western city.5 Sample 3 


5 The author wishes to thank Robert Bolin of 
the Division of Business Administration at Portland 
State College for assistance in obtaining this sample. 
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consisted of 26 upper-division students at a large 
mid-Atlantic state university.6 All students in 
Samples 1 and 3 were male, while 7 of the 39 in 
Sample 2 were female. 

In addition to the occupational choice question- 
naire previously discussed, and Ghiselli’s Self- 
Description Inventory, each sample was measured 
as follows: 

Measuring Instruments. 1. Sample 1 was adminis- 
tered the Crites Vocation Reaction Survey twice, 
once under normal instructions of self-description 
and once when they were asked to describe the 
degree to which each characteristic was typical of 
their chosen occupation. This instrument provides a 
measure of seven vocational needs derived from 
factor-analytic research in this area (Crite, 1963), 
with these being Material Security, Job Freedom, 
Structure, System, Personal Status, Behavior Con- 
trol, and Social Service. 

The order of presentation, approximately 2 weeks 
apart, was reversed for one half of the group, but 
no order effect showed up on preliminary analysis. 

2. Sample 2 was administered the Minnesota Im- 
portance Questionnaire (Weiss, Dawis, Englander, & 
Lofquist, 1964)? twice under similar instructions and 
procedure to that of Sample 1. 

The Minnesota Importance Questionnaire provides 
a rating of the importance of 20 vocationally rele- 
vant needs to an individual, and utilizes a Likert- 
type format. The 20 scales are Ability Utilization, 
Achievement, Activity, Advancement, Authority, 
Company Policies and Practices, Compensation, Co- 
Workers, Creativity, Independence, Moral Values, 
Recognition, Responsibility, Security, Social Service, 
Social Status, Supervision—Technical, Variety, and 
Working Conditions. It is a 100-item questionnaire, 
with 5 items devoted to each scale. The reliabilities, 
using the Hoyt analysis-of-variance procedure, of 
each scale are quite high, with only 2 below .80. 

Evidence concerning the construct validity of this 
instrument is provided in Weiss et al. (1964). 

3. Sample 3 was administered the Crites Vocation 
Reaction Survey under the same conditions and 
procedures as the first two samples. 

Method of Analysis. The following procedures 
were used in the data analysis: 

1. For Samples 1 and 3, the two most important 
and least important needs were determined for each 
person on the Crites Vocation Survey. (In the few 
cases of ties, the three most/least important needs 
were used.) A total “high” and “low” need score 
was then computed for each individual. From this, 
the comparable score of satisfaction expectancy for 
these needs in the chosen occupation was compared, 
and discrepancy determined between “high needs 


6 The author wishes to thank Stephen J. Carroll 
of the School of Business Administration at the 
University of Maryland for assistance in obtaining 
this sample. 

7The author wishes to thank R. Dawis of the 
University of Minnesota Industrial Relations Center 
for granting permission to reproduce this question- 
naire. 
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and expectancy of satisfaction” and “low needs and 
expectancy of satisfaction” for each individual. This 
discrepancy score was computed as follows: 


(a) For high needs: 
Discrepancy = Need Score — Expected 
Satisfaction Score 
(6) For low needs: 
Discrepancy = Expected Satisfaction 
Score — Needs Score 


Despite “ceiling” problems for the “high needs” 
analysis, there were two cases where expected satis- 
faction was greater than needs. For the “low 
needs” analysis, there were two cases, despite the 
“basement effect,” where the expected satisfaction was 
less than the need score. These were added in alge- 
braically in the computations for the “low needs” 
group, but were treated as zero discrepancy for 
the “high needs” analysis, since any other treatment 
would lead to difficulties in interpretation. 

2. For Sample 2, a similar procedure was followed 
as described above, except that the top four and 
bottom four needs were utilized from the more 
extensive Minnesota Importance Questionnaire. The 
few cases of ties were treated in a similar fashion. 


RESULTS 


The results of the investigation are sum- 
marized in Table 5. They indicate that all 
hypotheses are supported as predicted for all 
samples. For important needs, the expectancy 
of satisfaction in the chosen occupation is 


TABLE 5 


DISCREPANCIES BETWEEN SELF-PERCEIVED NEEDS AND 
EXPECTANCY OF NEED-SATISFACTION AS A 
FUNCTION OF SELF-ESTEEM 








Important needs Unimportant needs 





Sample 
High self- | Low self- | High self- | Low self- 
esteem esteem esteem esteem 

1 

M 1.27 2.24 aie 92 

SD ES 1.6 .62 94 

N 13 24 13 24 

t 1.82* IRS 2 .68 .68 
2, 

M 1.00 2.02 3.95 2.64 

i) ez le 3.4 2.4 

N 14 2a 14 25 

t 1.96* 1.96* 1.41 1.41 
3 

M 2.94 4.24 123: 719 

SD 1.9 ae 1.4 12 

N 13 13 13 13 

t 1.80* 1.80* .80 80 


*p <.05, one-tailed. 
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significantly greater for those with high self- 
esteem than for those with low self-esteem. 
The difference does not appear for those needs 
which are self-perceived as unimportant. 


DISCUSSION 


The results of this investigation support 
quite strongly the prediction that “self- 
esteem” operates as a moderator variable in 
the process of vocational choice in that those 
who are high on this variable use their self- 
perceived needs differently from those who 
think relatively poorly of themselves. That 
is, for those high in self-esteem their self- 
perceived needs are those that have been 
satisfied in the past and it is, therefore, ap- 
propriate and consistent for the individual 
to seek out those roles where they will be 
satisfied in the future. On the other hand, for 
the individual low on self-esteem, such moti- 
vation may appear not to exist. His self- 
perceived needs have not been satisfied in 
the past and he has, more likely, become both 
more familiar with nonneed-satisfying situa- 
tions and more accepting of them. To put it 
in our previous framework, such situations 
are more “consistent” for him than for the 
high self-esteem individual. This conclusion 
is solidified even further when we look at the 
significant results of Hypothesis 2B (Table 
3) and the trends of Hypotheses 1D (Table 
2) and 2D (Table 4). In these cases, the 
low self-esteem individuals are opposite to 
what would be predicted from the occupa- 
tional stereotype, a situation which certainly 
provides negative evidence for a _ simple 
“match self to occupational stereotype” proc- 
ess in vocational choice. In essence, then, 
these results seem to support in a realistic, 
highly important life-choice situation, the 
findings of a number of laboratory investiga- 
tions that individuals of low self-esteem are 
more likely to seek less reward for a similar 
task than individuals of high self-esteem 
(Pepitone, 1964, Ch. 2), and to rate informa- 
tion which confirms their low self-esteem 
more favorably than information which tells 
them they are better than low-esteem tells 
them they are (Wilson, 1965). Thus, while 
the results reported here are correlational in 
nature, they are supported by a number of 
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experimental laboratory investigations of 
more circumscribed choice situations. 

A number of implications for further re- 
search suggest themselves also. For example, 
since self-perception of abilities can also be 
measured and conceived of as part of the 
self-percept, would this mean that individuals 
of low self-esteem are more likely to accept 
those social roles (e.g., jobs, student roles, 
etc.) where they believe they do not have 
high abilities and less likely to wind up in 
those roles where they believe they do. If 
such self-perceived abilities are related to 
actual abilities, and at least a moderate rela- 
tionship does exist (Arsenian, 1942), does 
this mean that such individuals guarantee 
themselves failures by the manner of their 
choice-making? In other words, does “self- 
esteem” operate as a moderator here also in 
that persons with a high sense of personal 
adequacy search for a situation where they 
will be adequate, that is, where they believe 
they have high abilities, whereas low self- 
esteem people are more likely to be accepting 
of a situation where they believe they are 
likely to be inadequate, that is, where they 
believe they do not have high abilities? 
Research is needed of both a correlational 
and experimental nature. 

Finally, it also seems to be quite necessary 
that the generality of such choice-making 
patterns be determined. Do individuals of 
low self-esteem choose all roles in a similar 
manner, or are the concept and the postulated 
relationships too general in nature? Must we 
conceive of a “vocational self” choosing a 
“vocational role,” a “marital self” choosing a 
“marital role,” etc.? These questions are, of 
course, not new but their relevance remains. 
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SOME CHARACTERISTICS OF EFFECTIVE INTERVIEWERS? 
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Sample addresses were selected on a probability basis from the records of 
financial institutions and the holdings reported to the interviewer were com- 
pared with institution records for the day of the interview. The frequency with 
which an interviewer obtained information about the validated account(s) 
forms the basis for the criteria of interviewer effectiveness. It was found 
that the more effective interviewers scored significantly higher on the domi- 
nance and intraception tests and lower on the succorance and change tests 
of the Edwards Personal Preference Schedule (EPPS). In addition, they 
scored significantly higher in reference evaluations of self-confidence and 


attention to detail. 


This paper focuses primarily on interviewer 
personality traits as they relate to inter- 
viewing effectiveness. Work in this area has 
suggested a negative relationship between 
measures of interviewer effectivenesss and 
interviewer characteristics such as agreeable- 
ness and cooperation (Guest & Nuckols, 
1950), experience in approach and persuasion 
(Keys, 1949; Sheatsley, 1951), social orien- 
tation (Keys, 1949), and dominance and 
emotional stability (Guest, 1947). A positive 
relationship has been observed with objectiv- 
ity (Guest & Nuckols, 1950), self-sufficiency 
(Guest, 1947), and introversion (Keys, 
1949). Hyman et al. (1954) conclude from 
much of this work that 


characteristics which seem associated with social 
skills or social orientation; agreeableness, or co- 
operativeness to be somewhat negatively associated 
with performance although this relationship is not a 
strong one [pp. 301-302]. 


In contrast, Axelrod and Cannell (1959) 
found that the more effective interviewers 
employed by the Survey Research Center 
tended to be person oriented. This apparent 
inconsistency is possibly resolved by recog- 
nizing that Hyman (1954, p. 294) appears, 
from his references to Taft, to be thinking 
in terms of social dependence while Axelrod 
and Cannell (1959) are thinking in terms of 


1The study reported here was undertaken at the 
University of Illinois as part of the Consumer 
Savings Project of the Inter-University Committee 
for Research on Consumer Behavior under a grant 
from the Ford Foundation. The writer is indebted 
to the director of the project, Robert Ferber, for 
helpful comments on this paper. 


social skills. In this paper the Edwards Per- 
sonal Preference Schedule (EPPS) and a 
Reference Evaluation Check List (RECL) 
will be employed in conjunction with a 
measure of interviewer effectiveness based on 
validated information to further explore these 
hypotheses. 


METHOD 


An initial sample of 316 addresses was selected 
on a probability basis from time deposit records 
of saving institutions located in a small metropolitan 
area in the Midwest. The addresses were assigned 
to an interviewing staff who were unaware that a 
portion of the information they obtained would be 
validated—compared with records of the savings 
institutions for the day of the interview. In the 
interviews, which lasted an average of 13 hours, 
information was sought about the complete financial 
holdings of the savings units 2 residing at the sample 
addresses. 

This research design yielded the following criteria 
of interviewer effectiveness: 

1. Pickup (P) rate: The percentage of an inter- 
viewer’s contacts in which he picked up or ob- 
tained a response which indicated that the validated 
account(s) was owned by a member of the savings 
unit. An interviewer may have failed to pick up a 
validated account(s) because he was refused an 
interview by a responsible person at a savings unit, 
or, when an interview was obtained, because the 
respondent(s) failed to mention the validated ac- 
count. These dimensions of interviewer effectiveness 
are reflected by the next two rates. 

2. Response (R) rate: The percentage of contacts 
where an interviewer obtained an interview. 

3. Account-mention (AM) rate: The percentage 
of interviews in which the validated savings ac- 


2 A savings unit was defined as one or more persons 
living in the same dwelling, pooling half or more of 
their income and savings. A dwelling unit may, there- 
fore, have more than one savings unit. 
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count was mentioned by the respondent. The pickup 
(P) rate then is the product of the response (R) 
rate and the account-mention (AM) rate. Thus: 
P=RX AM. 

The accuracy with which respondents reported 
the amount in the validated account could not be 
included in the above criteria because of an experi- 
ment conducted on the study. The experimental 
questionnaires were similar up to the point in the 
interview where the existence of the validated ac- 
count was ascertained, but then a random half of 
the sample was asked for the amount in their ac- 
count on the day of the interview while the other 
half was asked for change in the account over the 
previous 3 months. 

Since the results of the present study will be 
compared with those of the earlier studies, it is 
important to note certain differences in research 
design. First, interviewers on the earlier studies 
sought primarily attitudinal information while those 
on the present study sought factual information. 
Second, many of the criteria of interviewer effec- 
tiveness employed on the earlier studies appear not 
to be related to that used on the present study. It 
has been shown elsewhere (Steinkamp, 1964) that 
measures, similar to these earlier criteria, based on 
the number of ambiguous answers (No answers, 
Don’t knows, failures to probe), refusals of specific 
information, respondent use of records, and the like 
were not related to the P rate. This finding raises 
a question of the interpretation to be placed on the 
findings of these earlier studies. 


The Interviewers 


The selection and training procedures and results 
of administering the EPPS to the interviewers are 
discussed elsewhere (Hauck & Steinkamp, 1964), and 
will only be summarized here. Out of 141 applicants, 
50 were selected for training and 21 were finally em- 
ployed as interviewers. The following criteria were 
used in selecting applicants for training: favorable 
letters of recommendation, favorable personal inter- 
view by staff members, age between 25 and 55 years, 
college degree, availability of at least 15 hours per 
week for interviewing, use of a car, and perma- 
nency in the area. Interviewer training consisted of 
four 3-hour sessions, home study of materials, a 
comprehensive posttraining examination, round- 
robin discussion of trial interviews taken by 
trainees, and discussion of early interviews with the 
supervisor. 

Three interviewers were detected through the vali- 
dation data as falsifying some of their interviews. 
Some demographic characteristics of the remaining 
18 interviewers are presented in Table 1. Further 
insights into the effect of the selection process on 
interviewer traits were obtained by comparing the 
EPPS scores of 16 of the interviewers who took 
this test with norms from adult and college samples 
(Edwards, 1959) and with scores for a sample of 
teachers. 

When compared with the adult sample, the inter- 
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TABLE 1 
NUMBER OF INTERVIEWERS WITH SELECTED 
CHARACTERISTICS 
Characteristic No. interviewers 

Sex 

Male 12 

Female 6 
Age 

Under 25 1 

25-40 ta 

41-55 5 

56 and over 1 
Education 

College 8 

Postgraduate 10 
Occupation 

Teacher 9 

Other professional 3 

Housewife 2 

Other 4 





viewers showed significantly less succorance, nurtur- 
ance, and abasement. The first two of these are 
associated with needs to give to and receive from 
others affection, sympathy, understanding, forgive- 
ness, and the like. Abasement is associated with 
needs to feel guilty when things go wrong, to feel 
inferior to others, and the like. The interviewers 
showed more intraception, dominance, change, and 
heterosexuality. The first three of these will be 
discussed in the next section. Heterosexuality is asso- 
ciated with needs to engage in social activities with 
the opposite sex, to become sexually excited, and 
the like. 

When compared with the college sample the inter- 
viewers displayed significantly more deference, order, 
and intraception, and less succorance. Some of the 
needs associated with deference are: to get sug- 
gestions from others, to praise others, to tell others 
they have done a good job, to conform to custom, 
and to avoid the unconventional. Associated with 
order is the need for a neat, well-organized approach 
to activities. A comparison of interviewer scores 
with those reported for a sample of grade and high 
school teachers (Jackson & Guba, 1957) did not 
yield any significant differences. 

Interviewer assignments were not randomized. 
However, an examination of the location of the 
assignments did not indicate that a relationship ex- 
isted between interviewer effectiveness and known 
socioeconomic characteristics of the assignments. In 
addition, no significant differences appeared in the 
size of the validated accounts held at sample 
addresses assigned to different interviewers. 


RESULTS 


The interviewers made contact with a 
responsible person at 98% of the savings 


CHARACTERISTICS OF EFFECTIVE INTERVIEWERS 


TABLE 2 


REGRESSION COEFFICIENTS AND THEIR STANDARD 
ERRORS FOR THE 15 PERSONALITY VARIABLES 
OF THE EPPS AND THE MEASURES OF 
INTERVIEWER EFFECTIVENESS 














“ad AM rate P rate 
Achievement —.227 144. 
(.840) (.645) 

Deference 785 Eo 
(.920) (.741) 

Order —.619 — 330 
(.600) (.491) 

Exhibition 429 .216 
(.778) (.627) 

Autonomy —.201 —.116 
(.602) (.483) 

Affiliation — 331 —.751 
(.756) (.576) 

Intraception 2.449* 1.080 
(.824) (.792) 
Succorance — 827 —1.282* 
(.865) (.627) 
Dominance 1.433* 944* 
(.365) (.340) 

Abasement — 162 —.557 
(.651) (.500) 

Nurturance 205 360 
(.950) (.757) 

Change —1.829* — .625 
(.665) (.639) 

Endurance 356 473 
(.538) (.418) 

Heterosexuality — .068 — .096 
(.489) (.391) 

Aggression — .246 — 397 
(.732) (.579) 

*p <.05. 
units owning validated accounts. The 


number of contacts per interviewer ranged 
ftom 11 to 24. The P sate, which is 
the proportion of contacts where an inter- 
viewer ascertained the existence of the vali- 
dated savings account, ranged from 32% to 
64% with an average of 50%. R rates of 
interviewers varied from 46% to 91% with 
an average of 73%. AM rates ranged from 
50% to 90% with an average of 69%. In 
about 90% of the interviews one or more of 
the owners of the validated savings account 
were present. 

The validation aspect of the study offered 
a unique opportunity to evaluate the overall 
effectiveness of this type of interview survey. 
These findings represent the framework 
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within which the subsequent analysis must 
be viewed and indicate, as do other similar 
studies (Broida, 1962; Horn, 1960; Lansing, 
Ginsburg, & Braaten, 1963), the need for 
further research oriented toward increasing 
interviewer effectiveness. 

It was found in the present study that the 
total amount reported by the respondents 
was a little over two fifths of the true total 
amount in the validated accounts owned by 
savings units where contact was made. Pos- 
sibly of greater importance was the finding 
that the computed sampling variances were 
biased estimates of the true variances 
(Ferber, 1966). In interpreting these find- 
ings, it must be kept in mind that the study 
sought complex information and that people 
are very sensitive to reporting information 
about time deposits. The study therefore 
presented the interviewers with a severe test 
of their abilities to motivate respondents to 
report complete, accurate information. 


Interviewer Characteristics and Effectiveness 


The analysis in this section employs regres- 
sion models to relate interviewer character- 
istics to interviewer AM and P rates. As in 
all studies mentioned earlier, with the pos- 
sible exception of Guest’s (1947) work in 
laboratories, the present analysis applies only 
to the interviewers used on the study and 
not to the applicants from which these inter- 
viewers were selected. As a result, the ab- 
sence of a relationship should not be inter- 
preted as indicating that a variable is of no 
value in selecting effective interviewers from 
a group of applicants, particularly where the 
variable or a correlate may have been em- 
ployed in selecting the interviewers. Such 
variables may have successfully rejected un- 
desirable applicants but have no value in 
discriminating among those hired. 

Regression coefficients and their standard 
errors based on 16 interviewers are presented 
in Table 2 for the 15 variables making up 
the EPPS and the P and AM rates. Two of 
these variables, dominance and succorance, 
are significantly related to the P rate. Domi- 
nance is positively related and explains 31% 
of the variance in the P rate. Succorance is 
negatively related and explains 18% of the 
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variance. Some of the needs associated with 
dominance are: to persuade and _ influence 
others to do what one wants, to supervise 
and direct the actions of others. Those asso- 
ciated with succorance are: to have others 
provide help when in trouble, to have others 
be sympathetic and understanding about per- 
sonal problems, to receive a great deal of 
affection from others. 

Some insights into in-interview effective- 
ness are provided by the AM rate. Three of 
the variables, intraception, dominance, and 
change, are significantly related to the AM 
rate. The first two show a positive relation- 
ship and explain 34 and 44% of the vari- 
ance, respectively. The third, change, is nega- 
tively related to the P rate and explains 30% 
of the variance in it. The manifest needs 


TABLE 3 


REGRESSION COEFFICIENTS AND THEIR STANDARD 
ERROR FOR SELECTED EXPLANATORY VARIABLES 
AND THE MEASURES OF INTERVIEWER 











EFFECTIVENESS 
Explanatory variables AM rate} P rate 
RECL® 

Ease at making friends 2.523 2.296 
(3.667) | (2.921) 

Honesty — 1.840 SYA 
(3.574) | (2.885) 
Self-confidence MESS Meno D4 
(2.701) | (1.747) 

Attention to detail 4.847% | 2.711 
(2.182) | (1.898) 

Procrastination 1.782 1.804 
(2.966) | (2.357) 

Think on feet 2.208 2.294 
(2.779) | (2.191) 

Appearance o2 2.929 
(3.112) | (2.368) 

Initiative 3.004 2.962 
(2.120) | (1.633) 

Age 1.179 2isat 
(3.290) | (2.572) 

Education —.674 1.622 
(2.470) | (1.934) 

Sex (female high) 3.500 7.567 
(6.316) | (4.695) 

Number of children — .988 — .934 
(1.000) | (.789) 

Hours available for interviewing 119 .182 
(.184) (.141) 





a References rated interviewers directly on the following 
scale: superior (4), above average (3), average (2), below aver- 
age (1), inadequate basis for judgment. An interviewer's score 
is the arers for all his references. 

*p <.0 
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associated with intraception are: to observe 
others, to understand how others feel about 
problems, to analyze the motives of others. 
Those associated with change are: to do new 
and different things, to experience novelty and 
change in daily routine, and to participate 
in new fads and fashions. 

Regression coefficients based on all 18 
interviewers were prepared for the scales 
making up the RECL and for selected other 
variables (Table 3). Reference evaluation of 
interviewer self-confidence was the only vari- 
able significantly related to the P rate. It 
explains 54% of the variance in the P rate. 
Two of the variables obtained from refer- 
ence ratings were significantly related to the 
AM rate. These are self-confidence which ex- 
plains 30% of the variance in the AM rate 
and attention to detail which explains 21%. 


Patterns 


The findings as well as intuition suggest 
that interviewer effectiveness depends not on 
a single characteristic, but on a pattern of 
characteristics. This section is devoted to 
reporting the results of a search for more 
complex regression models. 

A search of all combinations of two and 
three explanatory variables based on 16 inter- 
viewers yielded Equation 1 for the AM rate. 
The inclusion of other variables did not yield 
a significant net regression coefficient or ex- 
plain more than 5% of the unexplained 
variance. 


AM rate = 26.9 + 3.533 go + 1.275 24 


(1.580) (.329) 
ci [1] 
R? = .60 a nate 
where z, = score on the dominance test in 


the EPPS, z4 = reference evaluation of appli- 
cants’ attention to detail, and 26.9 is the con- 
stant term. The number ‘in parentheses below 
each net regression coefficient is the standard 
error of the coefficient. R? is the corrected 
coefficient of multiple determination and S, 
is the corrected standard error of esti- 
mate. The respective beta coefficients are: 
Bo = 644; Bo=.372. 

These two variables explain about 60% of 
the variance in the AM rate of interviewers. 


CHARACTERISTICS OF EFFECTIVE INTERVIEWERS 


TABLE 4 


ACTUAL PERFORMANCE RATES FOR INTERVIEWER 
Groups BASED UPON CompuTED AM RATES 








Actual rates 





Computed 
AM rate 
% AM Ca on 
Highest third 79 71 56 
Middle third 67 77 52 
Lower third 57 74 42 





If the interviewers are divided into thirds on 
the basis of their computed AM rate, the 
actual AM rate of the groups indicates some- 
thing of the effectiveness of this equation 
in discriminating among interviewers. These 
groups are shown in Table 4. 

Equation 2 presents the results of a search 
for explanatory variables for the P rate. This 
equation contains a variable which on the 
basis of a two-variable regression model was 
not significantly related to the P rate, the 
estimate of hours available for interviewing. 


Prate = 2.89 + 4.217 2, + .853 22 + .224 23 
(1727) G281) (.091) 
[2] 
aaa 1 Sy = 4:58 
where 2, =reference evaluation of self- 
confidence of applicant, %,—=score on 
dominance test on EPPS, and 23 = estimate 
of hours available for interviewing. The 
respective beta coefficients are: {1 = .422, 
B2'= .538, Bs = .400. 

When the interviewers were divided into 
thirds on the basis of their computed P rates, 
the actual P rates shown in Table 5 were 
obtained. 


TABLE 5 


ACTUAL PERFORMANCE RATES FOR INTERVIEWER 
Groups BASED UPON COMPUTED P RATES 








Actual rates 








Computed 
P rate 
AM R [® 
Highest third 79 73 58 
Middle third 66 74 49 
Lower third 58 71 41 
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CONCLUSIONS 


The pattern or profile of interviewer traits 
which emerge as being related to the overall 
measure of interviewer effectiveness, the P 
rate, appears not to be inconsistent with what 
Hyman et al. (1954) are interpreted as sug- 
gesting. The negative relation of the succor- 
ance measure with the P rate suggests that 
the more effective interviewers display less 
social dependence, particularly that associ- 
ated with receiving sympathy and affection 
from others, than do the less effective inter- 
viewers. This finding was based on a group 
of interviewers who show significantly less 
need for giving and receiving affection, sym- 
pathy, and understanding when compared 
with adult and college populations. 

The positive relationship observed be- 
tween interviewer effectiveness and the self- 
confidence ratings further suggests the ab- 
sence of dependence on others. While direct 
comparisons are not possible, it is interesting 
to note that Guest (1947) obtained a posi- 
tive relationship between interviewer effec- 
tiveness and the self-sufficiency scale on the 
Bernreuter Personality Inventory and a nega- 
tive relationship with the emotional stability 
scale. 

The higher dominance scores of the inter- 
viewers compared with the adult sample and 
of the more effective interviewers adds an- 
other dimension to the pattern—ability to 
control the interview situation. In contrast, 
Guest (1947), employing the Bernreuter with 
10 interviewers, obtained a negative though 
not significant relationship (r= .17) _ be- 
tween dominance scores and a measure of 
effectiveness based on attitudinal data. These 
results suggest that dominance may or may 
not be a desirable interviewer characteristic, 
depending on whether the study is basically 
factual or attitudinal. 

The analysis of in-interview effectiveness 
as measured by the AM rate yielded a rela- 
tionship which tends to support the Axelrod 
and Cannell (1959) hypothesis that more ef- 
fective interviewers are more socially skillful. 
The variable is intraception which is associ- 
ated with needs to analyze and view other 
people objectively. Guest and Nuckols (1950) 
also observed a relationship between inter- 
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viewer effectiveness and their measure of ob- 
jectivity. In addition, the negative relation- 
ship with the change variable on the present 
study suggests social stability on the part of 
the more effective interviewers. 

The higher reference ratings of the more 
effective interviewers on the attention to de- 
tail scale adds yet another dimension to the 
profile. The significance of this variable prob- 
ably reflects in part the complexity of the 
financial information sought in the interviews. 

In interpreting these conclusions, the use 
of linear analytic models must be taken into 
account. While the results suggest that more 
and more of the observed characteristics are 
desirable in interviewers, a priori reasoning 
suggests that characteristics may be desirable 
up to some level and become undesirable 
beyond. However, with the small number of 
interviewers available, such possibilities could 
not be rigorously investigated. 

Finally, the findings must be interpreted in 
terms of the limitations and nature of the 
study. The financial nature of the study limits 
the application of conclusions to attitudinal 
studies. The small number of interviewers and 
the inability to randomize assignments de- 
mand that the results be treated primarily as 
hypotheses for future study. 
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EFFECTS OF MUSIC ON EMPLOYEE ATTITUDE AND 
PRODUCTIVITY IN A SKATEBOARD FACTORY 


RICHARD I. NEWMAN, Jr.,1 DONALD L. HUNT,? and FEN RHODES 4 


California State College, Long Beach 


An experiment was designed to look at the effects of 4 types of music, versus 
no music, on the quantity and quality of production and the attitude of 
workers engaged in the routine task of assembling and packing skateboards. 
Ss were 26 assembly-line personnel between the ages of 18 and 23. 4 types 
of music were played: dance, show, folk, and popular. These were contrasted 
with periods during which no music was played. Music conditions were 
balanced with respect to days of the week over a period of 5 wk. Results 
showed that, while employees had a highly favorable attitude toward music 
and thought they did more work with it, there was no change in measured 


productivity. 


Studies of music in the industrial setting 
are numerous, ranging from dubious com- 
mercial investigations to objective inquiries 
into the particular effects of music on worker 
behavior and morale. Music has been vari- 
ously studied for its effects on production, 
scrappage, accident rate, turnover, and 
worker attitude (Uhrbrock, 1961). 

The results of this research have been 
quite variable. Some investigators (e.g., 
Smith, 1947) have reported significant, even 
large, increases in productivity associated 
with the playing of music. Others, such as 
McGehee and Gardner (1949), have found 
music to have no measurable effect on the 
production of workers. The results of studies 
relating music to quality of production are 
equally inconsistent—the possibility of a 
negative relationship existing between the two 
in some situations having even been noted 
(Uhrbrock, 1961). Part of the explanation for 
these conflicting results lies in the different 
jobs studied in each instance, It is generally 
acknowledged that an interaction exists be- 
tween type of job and the influence of music 
on job performance, with more routine, 
repetitive tasks presumably having a greater 
likelihood of showing improved performance 
upon the introduction of music into the work 
situation than higher skilled, more complex 


1Now at Washington State University, Pullman, 


Washington. 
2Now at The Boeing Company, Seattle, Washing- 


ton. 
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tasks. There is also the possibility that music, 
like job satisfaction, has little direct bearing 
on worker output and that positive findings 
will always be the exception regardless of the 
type of job. 

The effects of music on worker attitude 
are more consistent. Almost all workers, it 
appears, prefer music on the job, and only a 
small percentage actively dislike it (Kerr, 
1943, 1945; Kirkpatrick, 1943; McGehee & 
Gardner, 1949; Smith, 1947). The kinds of 
music preferred by various types of work- 
ers in different settings are not so well 
documented. 

The present investigation was designed to 
look at (a) the effects of four types of music, 
versus no music, on the quantity and quality 
of production, and (0) the attitude of work- 
ers engaged in the routine task of assembling 
and packing skateboards. It is believed that 
the study lies sufficiently within the frame- 
work of existing investigations to permit 
useful comparisons. 


MetHop 
Subjects 


Subjects were hourly paid, assembly-line personnel 
involved in the manufacture of skateboards. Opera- 
tions performed included drilling holes for the 
wheel assembly, screwing the assembly to the board, 
and packing the finished skateboards for shipment. 
During 3 of the 5 weeks of the study three assembly 
lines were in operation; one or two lines were in 
production for the remaining 2 weeks. The total 
number of line personnel ranged from § to 26 per 
day, with a median of 12.5. About half of the 
workers were male. Almost all workers were in their 
late teens or early 20s. 
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TABLE i 
EXPERIMENTAL DESIGN 
Day 
Week 
Monday Tuesday Wednesday Thursday Friday 

1 Dance Folk Show Popular No music 

2 Folk Popular Dance No music Show 

3 Popular No music Folk Show Dance 

4 Show Dance No music Folk Popular 

5 No music Show Popular Dance Folk 
Music hour and the percentage of rejects per worker per 


Four categories of music were played, namely, 
show music (Broadway musicals, both instrumental 
and vocal), dance music (instrumental arrangements 
of current songs and “old favorites” by Les Elgart, 
Ray Anthony, etc.), folk music (vocal selections by 
Joan Baez; Peter, Paul, and Mary; etc.), and 
popular music (largely vocal selections by the 
Beatles, the Brothers Four, etc.). 

The music selected was transferred from discs 
to magnetic tape to provide 2-hour sequences of 
uninterrupted selections. Two of these sequences 
were prepared for each music category, one of which 
was played in the morning and the other in the 
afternoon on each day for which a particular type 
of music was scheduled. The sound level was ad- 
justed so that, in the judgment of the experimenters 
and the workers, the music being played comfort- 
ably overrode the noise of the production operation. 


Experimental Design 


A Latin-square design was employed in which a 
different type of music was played during 4 days 
of each week. On the fifth day no music was played. 
The period of the study was 5 weeks—consecutive 
except for a 1-week interval between Weeks 4 and 5. 
The complete sequence of experimental treatments 
is shown in Table 1. 

On each day data were obtained regarding total 
number of units produced, number of rejects, num- 
ber of production personnel, and assembly-line 
running time. From this information daily estimates 
of the number of units produced per worker per 


TABLE 2 


ANALYSIS OF VARIANCE FOR AVERAGE NUMBER 
OF UNITS PRODUCED PER WORKER PER HouR 


hour were computed. Analyses of variance were 
performed on both measures to determine the effect 
of music type (as well as no music) on quantity 
and quality of production. 

At the end of the study a questionnaire similar 
to the one used by McGehee and Gardner (1949) 
was administered to the 26 workers present on that 
day to determine their general reaction to music 
on the job and also their opinion of the different 
types of music played. 


RESULTS 


The music played was found to have no 
influence on either number of units produced 
or percentage of rejects. Neither the types of 
music nor music versus no music had any 
effect on quantity and quality of worker 
output. The results of these analyses are sum- 
marized in Tables 2 and 3. 

In contrast the attitude of workers toward 
the music was highly favorable. The ques- 
tionnaire administered and employees’ re- 
sponses to individual items are reproduced in 
Table 4. Generally, employees liked the 
music and wanted it continued. Only 1 of the 
26 workers thought it interfered with his 
work. Everyone believed it made the time 
pass faster, and all of the workers but 1 


TABLE 3 


ANALYSIS OF VARIANCE FOR AVERAGE PERCENTAGE 
OF REJECTS PER WORKER PER Hour 











Source df MS F 
Music 4 12.15 — 
Days 4 5.90 — 
Weeks 4 144.15 4,47* 
Error 12 32.25 

* pb < .05 





Source df MS F 
Music 4 3.44 2.14 
Days 4 2.99 1.86 
Weeks 4 76 — 
Error 12 1.61 





Note,—Data transformed to Yarcsin. 


EFrFrects oF Music ON PRODUCTIVITY 
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TABLE 4 


EMPLOYEE RESPONSES TO QUESTIONNAIRE ADMINISTERED FOLLOWING 5 WEEKS OF MUSIC ON THE JOB 





Item 


Music makes the time pass faster 
Takes your mind off other things 
Gives you a lift when you're feeling tired 
Makes you feel more like coming to work 
Tf you come in feeling bad, the music helps 
Music keeps work from getting on your nerves 
Music gives you something to look forward to 
You get more work done with the music 
Music helps you tell how much time has passed 
Music helps you know if you are behind or ahead 
You move in time with the music 
Music breaks the monotony 
You do less talking with the music 
You seem to have more pep when there is music 
Music interferes with your work 
Music makes you nervous 
Music makes your job more pleasant 
Keeps you from getting so tired by the end of the day 
Would you like the music continued? 
Which kind of music did you like best ?* 

Show 

Dance 

Folk 

Popular 

No preference 


Note.—N = 26 





No. responses 


Yes No Can’t tell 
26 0 0 
11 14 1 
23 2 1 
ti 4 5 
20 4 2 
20 4 2 
18 5 3 
19 2 5 

7 16 3 
4 12 10 
14 9 3 
25 0 1 
21 4 1 
21 2 3 
1 25 0 
0 25 1 
23 1 2 
16 6 4 
23 Z 1 
0 — a 
2 - es 
8 ~ poe 
16 — — 
0 eee Pe. 


a In the questionnaire descriptive adjectives were included so there could be no confusion as to the type of music referred to. 


reported that it relieved the monotony of the 
job. There were 18 who indicated that the 
music gave them something to look forward 
to. Contrary to what was shown by produc- 
tion records, 73% of the employees indicated 
that they got more work done when music 
was played. 

It should be noted that the questionnaire 
results may be biased in the direction of 
spuriously favorable responses, since em- 
ployees did not know whether or not the 
music system was being installed on a per- 
manent basis and could have interpreted the 
questionnaire as a device to determine 
whether the music should be continued. The 
employees did not know, however, that pro- 
duction measures were being obtained or that 
a formal study was being conducted. 

By far the best-liked music was that in the 
popular category (Beatles, Watusi, etc.). 
This kind of music was preferred by 16 of 


the 26 employees. Folk music was second 
with 8 endorsements. Dance and show music 
were relatively unpopular. 


DISCUSSION 


The results of this study parallel closely 
the findings of McGehee and Gardner (1949) 
in their study of the effects of music on 
workers engaged in a rug-setting operation. 
In their investigation the lack of any rela- 
tionship between music and productivity was 
attributed in part to the high level of task 
complexity and the extensive experience of 
employees on the job. In the present study 
both of these conditions were reversed: The 
task involved routine operations requiring a 
relatively low degree of skill, and employees 
had little previous job experience (less than 
6 months in almost every instance). It thus 
appears that low task complexity and limited 
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job experience in no way assure a positive 
effect of music on productivity. 

It was not feasible to measure worker pro- 
ductivity on an individual basis, and there 
was consequently no information available 
concerning the possible effects of music on 
individual workers or the variability of out- 
put from worker to worker (from which 
could perhaps be inferred the presence or 
absence of restrictive production practices). 
On the latter point, the experimenters did 
not observe any evidence that workers were 
setting their level of production in accordance 
with informal group norms. 

The extremely favorable attitude of work- 
ers toward music on the job is in agreement 
with the findings reported by most other in- 
vestigators, as well as McGehee and Gardner 
(1949). It is interesting to note that the type 
of music most preferred by employees (i.e., 
popular vocal selections) is the sort least 
played and even avoided in the usual indus- 
trial setting. McGehee and Gardner (1949) 
found that their group of rug setters wanted 
to hear hymns and spirituals. In neither their 
investigation nor the present one did such 
music have any adverse effect on measured 
productivity, a reason frequently cited for 
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not playing vocal or offbeat selections in the 
work situation. These results indicate that on- 
the-job music preferences are quite variable 
and are specific to the population of em- 
ployees in a given situation. For these reasons 
it would seem wise in selecting music to be 
played on the job not to proceed on the basis 
of arbitrary prior assumptions regarding either 
employee preferences or adverse effects on 
job performance of various kinds of music. 
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LABOR TURNOVER AS A FUNCTION OF WORKER 


DIFFERENCES, WORK ENVIRONMENT, AND 
AUTHORITARIANISM OF FOREMEN 


RONALD LEY 
State University of New York at Albany 


The labor turnover rate of male production workers of a television picture- 
tube manufacturing company was studied with respect to: biographical data, 
work environment, and authoritarianism of foremen. It was found that 
workers who terminated their employment within 1 yr. were younger, had 
more jobs in the 2 yr. preceding their employment with the company, and 
had higher hourly wages on their last job, as compared with workers who 
maintained their employment for more than 1 yr. Although the turnover 
rate was found to be significantly higher on the 2nd and 3rd shifts as 
compared with the ist shift, no difference in rates was found among the 6 
work sections which differed considerably in terms of physical work condi- 
tions. The major factor found to be related to labor turnover was the degree 
of authoritarianism of the 12 foremen of the work sections, ie., turnover 


rate correlated .76 with authoritarianism ratings of the foremen. 


The problem of labor turnover has been 
approached from several different directions. 
Super (1949) points to case studies which 
have shown that personal maladjustment 
often underlies vocational dissatisfaction and 
frequent job changes. Wickert (1951) com- 
pared a labor turnover group with an on- 
force group and found that the groups did 
not differ in terms of biographical data, em- 
ployment test scores, or attitudes frequently 
considered to be important in accounting for 
turnover. Wickert concluded that lack of ego 
involvement in one’s work was the chief con- 
tributing factor to turnover. The importance 
of the physical conditions of work environ- 
ment as a variable affecting worker morale 
has been seriously questioned since the early 
work of Roethlisberger and Dickson (1934). 
The influence of the foreman, on the other 
hand, has been pointed to as a major factor 
affecting morale and turnover (Hand, Hop- 
pock, & Zlatchin, 1948; Hoppock & Odum, 
1940). Wechsler, Kahane, and Tannenbaum 
(1952) found that a considerably higher pro- 
portion of the members of a work group 
headed by a permissive leader considered 
themselves to be satisfied with their jobs, 
rated the productivity of their division 
higher, and considered the morale of their 
group to be higher when compared with a 
group headed by an authoritarian leader. 

Following leads offered by these studies, 


the present study investigated the labor 
turnover rate of a television picture-tube 
manufacturing company with respect to 
biographical data, work environment, and 
authoritarianism of foremen. 


METHOD 


Biographical data. The Turnover Group was a 
sample of 100 male hourly production workers 
selected from among all the early post-World-War- 
II records of unskilled workers, within a given pay 
range, who terminated their employment with the 
Buffalo Tube Plant, General Electric Company, 
within the first year of employment, average length 
of employment—2.5 months. The sample was strati- 
fied on the basis of shift hours worked, that is, 
33 men from each of the first (8:00 am—4:00 p.m.) 
and second (4:00 p.m—12:00 p.m.) shifts and 34 
from the third shift (12:00 p.m-—8:00 AM.). The 
Steady Group, chosen for the purpose of compari- 
son, was a sample of 100 men selected in the same 
fashion as the Turnover Group except that the 
sample was limited to workers who had maintained 
employment with the company for more than 1 
year, average length of employment—33.8 months. 

The following biographical items were taken from 
the employment records: age at start of employment 
with the company, years of education, number of 
dependents, length of military service, length of 
service with last employer, number of jobs held 
in 2 years preceding employment with the company, 
average length of service on past jobs, and hourly 
wages of last employment. Examination of these 
data indicated that more than half of the Turnover 
Group terminated their service during the first 
month of employment. Since it was apparent that 
the early period of employment was crucial, the 
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TABLE 1 


t-TEST COMPARISONS OF THE BIOGRAPHICAL ITEM 
MEANS OF TURNOVER GROUP WITH 
STEADY GROUP 





Steady | Turnover 








Biographical item | group group df t 
M M 

Age 29.89 26.33 | 198 | 2.608** 
Education 

(in years) 10.69 10.76 | 198} .23 
Number of 

dependents 1.39 1.28 198 | .54 
Months of 

military service 33.87 33.59 | 105] .08 
Length of service— 

last employment 

(in months) 16.59 14.76 | 183) .66 
Number of jobs in 

past 2 years 1.94 Deo? 198 | 2.28* 
Average length of 

service on jobs in 

past (in months) 18.83 iso? 198] .56 
Hourly wages of 

last employment 1.09 1.47 | 179 | 2.99** 

ED <.05. 
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Turnover Group was subdivided into two groups: 
the Immediate Turnover Group, those workers 
who left within the first month of employment, 
average length of employment—9.7 days (N = 54) 
and the Delayed Turnover Group, those who left 
after 1 month but before 1 year, average length 
of employment—159.8 days (N = 43). These groups 
were compared on the same biographical items as 
the Turnover and Steady Groups. 

Work environment. The effects of work environ- 
ment on turnover were measured by comparing the 
10-month turnover rates of six work sections (Bulb 
Salvage, Paint-Pack, Base-Test, Seal-Exhaust, 
Unpack-Screen, and Film-Vacuum Check, which 
differed considerably in terms of temperature, cleanli- 
ness, and safety hazards. Since each work section 
was represented on all three shifts, this gave a total 
of 18 sections. Rate of turnover for each work 
section on each shift was computed by dividing 
the total number of workers of a given work 
section, who terminated their employment during 
the 10-month study period, by the sum of the 
weekly average number of workers in the work 
section during the 10-month study period. 

Foremen. Three plant supervisors who were well 
acquainted with each of the 18 foremen were re- 
quested to rank order the foremen on authori- 
tarianism. The most permissive, least authoritarian 
foreman was ranked 1, while the least permissive, 
most authoritarian foreman was ranked 18. A com- 
posite ranking was then obtained by averaging the 
rankings of the three supervisors. 
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RESULTS 


The data of Table 1 show that the Turn- 
over and Steady Groups differed on three of 
the eight biographical items. Compared with 
the Steady Group, the Turnover Group was 
younger, held more jobs during the 2 years 
previous to joining the company, and had 
higher hourly wages on their last job. 

The data of Table 2 indicate that the 
Immediate Turnover Group had more years 
of education and higher hourly wages on 
their last job than the Delayed Group. 

Work environment (work section and shift) 
was evaluated through an analysis of vari- 
ance using the 10-month turnover rate for 
each work section of each shift as the unit of 
analysis. Table 3 shows that only the 
shift factor was significant (F = 4.1709, 
df= 2/10, p< .05). To determine which 
shifts differed in their mean monthly turn- 
over rates, ¢ tests were performed. The mean 
monthly turnover rate of the first shift, 4.85, 
differed significantly from the mean monthly 
rates of both the second shift, 8.91 (¢ = 2.62, 
df = 10, p< .05), and the third shift, 7.67 


TABLE 2 


t-TEST COMPARISONS OF THE BIOGRAPHICAL ITEM 
MEANS oF IMMEDIATE TURNOVER GROUP 
witH DrELAy TuRNOVER GRoUP 
eee ee ee 


Immedi- 


Delay 
Bi tents ate turnover 
lographical item | turnover group df t 
group 
M 

Age 26.33 27.42 98 68 
Education 

(in years) 12.70 10.83 98 | 4.35** 
Number of 

dependents E32, 1.23 98 31 
Months of 

military service 36.53 30.68 59 | 1.20 
Length of service 

last employment 

(in months) 16.84 12.48 93 | 1.16 
Number of jobs in 

past 2 years 2.07 2.44 93 | 1.48 
Average length of 

service on jobs in 

past (in months) | 19.70 14.60 | 93 | 1.68 
Hourly wages of 

last employment 1.58 1533 91925058 
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TABLE 3 


ANALYSIS OF VARIANCE OF MEAN MONTHLY 
TURNOVER RATE BY SHIFTS AND 
BY WoRK SECTIONS 











Source of variance | df SS) MS F 
Shifts 2 | 51.8661 | 25.9330 | 4.1709* 
Work sections 5 | 22.2665} 4.4533 
Shift Work 

Section (error) 10 | 62.1762] 6.2176 
Total 17 |136.3088 
*b <.05. 


(¢ = 3.04, df = 10, p < .05), while the dif- 
ference between the mean rates of the second 
and third shifts was not significant (¢ = .80, 
df = 10, p> .05). 

Since the rates of turnover of the second 
and third shifts were significantly higher than 
the rate of the first, but not significantly dif- 
ferent one from the other, the rank-order 
correlation between the composite super- 
visors’ ranking of the foremen on the authori- 
tarianism dimension and rate of turnover by 
foreman group, was limited to the 12 fore- 
men on the second and third shifts. A rank- 
order correlation coefficient of .76, p< .01, 
was obtained between high turnover rate and 
high authoritarian ranking. 


DISCUSSION 


Unlike Wickert’s (1951) findings, three of 
the biographical items of the present study 
were found to show significant differences 
between the Turnover Group and the Steady 
Group. Although none of these data allows 
for speculation concerning the presence or 
absence of ego involvement (a major factor 
pointed to by Wickert), the finding that the 
Turnover Group had had more jobs in the 
past 2 years suggests a general element of 
job dissatisfaction. The finding that the 
Turnover Group was younger than the Steady 
Group suggests that the true difference in 
rate of job changing is greater than that 
indicated by the obtained data since the 
Turnover Group had probably spent less 
time in the labor market and thus, had less 
time in which to change jobs. The extent to 
which this factor of job changing may be 
related to personal maladjustment cannot be 
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determined from the present data. However, 
if the economic incentive of higher wages 
is a factor in maintaining workers on a job, 
the finding that the hourly wages of the last 
employment of the Turnover Group were 
higher than those of the Steady Group sug- 
gests that whatever motives underlie the 
Turnover Group’s job changing overrode the 
financial incentive of those jobs they left. 
Wages are, of course, a relative matter and 
it may be that the change in the Turnover 
Group’s wages between their last employment 
and their wages at General Electric was sig- 
nificantly smaller than the change in wages 
for the Steady Group. Thus, the hourly wage 
rate at General Electric might be a significant 
economic incentive for the Steady Group, but 
not for the Turnover Group. 

Further evidence for the importance of the 
economic incentive is given in Table 2 where 
the biographical items of the workers of the 
Immediate Turnover Group are compared 
with those of the Delayed Turnover Group. 
Except for the education item, which might 
be accounted for by college students who 
attempted to work, full time while attending 
school, the only significant difference between 
these groups was hourly wages of last em- 
ployment, where the Immediate Turnover 
Group had higher hourly wages of previous 
employment than the Delayed Turnover 
Group. Other things equal, these data sug- 
gest that length of employment is at least 
partly a function of the difference between a 
worker’s previous wages and his current. 
wages. 

The finding that the first and second shifts 
have a higher rate of turnover than the first 
was obtained in spite of an added financial 
incentive (10 cents per hour) for the workers 
of the second and third shifts. If financial] 
incentive is an important factor, it would 
appear that the size of the incentive used by 
the company was not adequate to compensate 
for the lack of desirability of the second and 
third shift work-hours. It is likely, however, 
that the higher turnover rates of the second 
and third shift workers are to some extent 
due to other factors, especially ‘“moonlight- 
ing,’ where the worker takes an additional 
second or third shift job for a limited time. 

Apart from the significantly higher rates 
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of turnover on the second and third shifts, 
it would appear that foremen have a con- 
siderable influence upon turnover, a finding 
consistent with those of Hoppock ‘and Odum 
(1940) and Hand, Hoppock, and Zlatchin 
(1948). The high correlation of .76 between 
rate of turnover by foremen and supervisors’ 
ratings of the foremen on the authoritarian 
dimension strongly points to the fact that 
workers are likely to terminate their em- 
ployment if they are assigned to a more 
authoritarian-rated foreman than if they are 
assigned to a less authoritarian foreman. To 
the extent that turnover is an indicator of 
morale, these data are in keeping with those 
of Wechsler, Kahane, and Tannenbaum 
(1952). Since the authoritarian ranking pro- 
cedure allowed the ranking supervisors to use 
their own definitions of authoritarian be- 
havior, it is not possible to study those 
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aspects of the foremen’s personality which 
contributed most to their ranking. 
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DYNAMIC CHARACTER OF CRITERIA: 
ORGANIZATION CHANGE 


ERICH P. PRIEN 


University of Akron 


Previous thought and research on criterion development emphasize: measure- 
ment problems related to scaling and analysis, problems created by the 
sponsor, values of the researcher, aspects of deriving a composite criterion, 
and the dynamic character of job requirements related to incumbent learning. 
There is an additional variable(s) to be considered, organization change and 
the effect of changing needs on the nature of the criteria of individual jobs. 
Job duties may remain static under these circumstances, only the relevance 


of performance changes. 


In recent years, the bulk of the attention 
in criterion development has been on em- 
pirical studies within a limited framework. 
Only secondary attention has been given to 
the basic principles and relationships which 
enter into a criterion-research problem. The 
factors which limit the usefulness or value 
of criteria in a technical sense are clearly 
stated by Brogden and Taylor (1950), and 
in the philosophical sense by Guion (1961). 
Situational characteristics interacting with 
the individual’s learning of the job are amply 
specified by Ghiselli and Haire (1960), and 
verified further by Bass (1962). Expanding 
the concept of individual performance to the 
organization, Bass (1952) and_ indirectly 
Seashore, Indik, and Georgopoulos (1960) 
suggest the breadth and complexity of the 
problem. Fiske (1951) and later Guion 
(1961) point up the problems and the limita- 
tions imposed by researchers’ practices in 
seeking an ultimate criterion. The concessions 
implied by Wherry (1957) of settling for 
something less than the ultimate criterion 
could well be sidestepped, were the researcher 
to put into practice the philosophy of: Otis 
(1953), that researchers should invest as 
much time and effort in criterion develop- 
ment as they do in predictor development; 
Nagel (1953) by following a standard pro- 
cedure; and Guion (1961) by abandoning 
the concept of a composite criterion, which 
as a single measure of “goodness” may do 
little more than cloud empirical results. Stark 
(1959) makes the same-plea, though indi- 
rectly, emphasizing a functional taxonomy of 
position functions for the prediction of 
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executive success. In another approach, 
Flanagan’s (1949) “Critical Incident” tech- 
nique is a potential solution although the 
flexibility of the procedure leaves something 
to be desired in research directed to the 
generation of quantitative models. The ulti- 
mate, although perhaps currently some- 
what impractical, solution has already been 
proposed by Toops (1959). 

It would seem from the foregoing that 
every aspect of criterion development, meas- 
urement, and utilization has been adequately 
discussed. There are the statistical problems 
of reliability and relevance, the approach to 
ultimate criteria, the incorporation of the 
individual’s characteristics (especially learn- 
ing), moderator variables, the technical 
niceties of the researcher’s philosophy and 
practice, and the empirical problems and 
limitations of the sponsor and situation. 

In each of these articles, the authors define 
an aspect or a principle of criterion develop- 
ment which should be acknowledged and ac- 
counted for in each criterion study, whether 
of the performance of individuals on a pro- 
duction line, executive behavior, group pro- 
ductivity, or organizational effectiveness. 
There is one aspect, however, which appears 
to have received only minimal attention, yet 
one which appears to be of considerable im- 
portance, particularly at the present time 
when more elaborate, comprehensive studies 
are being conducted. This is the transitional 
nature of criterion dimensions—simply the 
dynamic nature of organization needs and 
ultimately of individual performance require- 
ments. There is considerable emphasis to- 
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day on the subject of organization change— 
planned, sponsored change. This is in addi- 
tion to fortuitous change resulting from 
organization growth or extraorganizational 
changes which have an internal impact. 
Admittedly, many researchers appear to 
spend the major portion of their research 
time on those problems which are most visi- 
ble and the ones in which they will encounter 
the least opposition, In a selection study, 
most researchers will not consider utilizing 
a test with anything but near-perfect  reli- 
ability and demonstrated factorial multi- 
dimensionality. Yet this same researcher will 
evaluate this very elegent predictor using 
the shoddiest, partial, immediate, or proxi- 
mate criterion. It is not surprising under 
these circumstances that the frequency of 
“Trish” coefficients exceeds the frequency of 
statistically significant validity coefficients. 
Guion (1961) has pointed out that research- 
ers are prone to combine individual criterion 
measures to obtain a composite criterion 
with which to delude themselves to represent 
an approximation of the ultimate criterion, 
And, in spite of Bass (1952), Ghiselli and 
Haire (1960), and Bass (1962), today’s cri- 
terion is used to validate yesterday’s test, 
and to select tomorrow’s workers. The ideal 
validation model involves the assumptions 
of invariance of established basic relation- 
ships, but must incorporate time- and organi- 
zation-related variables. However, the time- 
and organization-related variables may bear 
either a direct functional relationship to the 
dependent variable or combine as moderators. 
Unfortunately, in spite of Otis (1953), ac- 
ceptance is made of the criterion data pro- 
vided by management or what management is 
instructed to collect within the framework of 
the researcher’s own values and preferences. 
It is rare indeed today to find a high-salaried 
industrial psychologist or consultant partici- 
pating in the actual collection of data (such 
as conducting field-review interviews) or 
much less involving himself in the physical 
performance setting to obtain an apprecia- 
tion, at least in the subjective sense, of the 
complexity of criterion performance. Whether 
or not the principles and relationships are 
acknowledged and accounted for in criterion 
development, they have been specified and 
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exist in the published literature to be taken 
advantage of by a competent researcher. Of 
particular interest in this extension are the 
formulations of Guion (1961), Ghiselli and 
Haire (1960), and Bass (1952, 1962). 

In the above references, the attention is 
directed primarily to the performance and 
characteristics of individuals and the indus- 
trial job-learning situation. It has been 
pointed out that, as an individual learns and 
masters the complex functions of his job, new 
or theretofore unused aptitudes become im- 
portant contributors to criterion performance. 
Here reference is to the individual and his 
complex of abilities, aptitudes, and more 
personal characteristics, plus the demands of 
the static, well-defined job. However, as the 
job becomes more complex, more difficult, 
and more intangible in nature, the pattern 
of abilities also becomes different and the 
number of distinguishable phases increases 
substantially. In the extreme, the man who 
becomes the manager of manufacturing has 
already succeeded in a number of complex 
job spirals before he performs his first job-act 
in the final position of manager of manufac- 
turing. In a complex job, complete mastery 
may involve many years of learning and de- 
velopment as compared to the time required 
to reach criterion performance in an unskilled 
or semiskilled job. However, it is just these 
highly complex and intangible jobs which are 
of maximum interest to today’s industrial 
psychologist. The cry is for more studies of 
executive performance, or longitudinal studies 
of college graduates from the day of orienta- 
tion into the company management program 
through to their retirement, the creativity 
of research scientists, the broad, across- 
company studies of various kinds of executive 
personnel. 

Another factor becomes critical in the 
above extension. Obviously, there is the 
problem of temporal proximity. If a test is 
administered to a man, and his performance 
evaluated 10 years later, he is no doubt a 
different man because of what has happened 
during the intervening years. However, all 
other things being equal (which they never 
are) a validity coefficient derived on_ this 
basis is considered to be a rather valuable 
finding. However, it is the contention of the 
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author that in addition to changes in the 
man and discernible changes in the job as a 
function of learning, in many situations there 
will be changes which are not acknowledged 
and which occur in considerably less than 
10 years. Studies by Boyles, Eddy, and Frost 
(1963) and Prien (1965) provide some clues 
regarding organization change. In the former, 
the authors factor analyzed 24 organization 
variables and obtained six factors. Each of 
the factors in part reflect change of a cyclic 
nature within the organization studied. Fur- 
ther, time directions were expressed in terms 
of months and also, factor definitions in- 
cluded personnel response elements. In the 
latter study, 38 organization variables were 
factor analyzed, but in this case the data 
were obtained from 107 different organiza- 
tions, Nine factors were obtained, four of 
which were defined in part by change ele- 
ments. Change in this case was purposeful 
and management initiated or expressed as 
a management goal. Fortuitous changes 
brought about by conditions outside of the 
organization were not identified but no doubt 
exist. Indirect evidence regarding change 
agents within organizations is provided in a 
study by Georgopoulos, Indik, and Seashore 
(1960). A factor analysis of attitudinal data 
with organization variables yielded seven 
factors, three of which contain elements 
regarding preference for or desire to modify 
some aspect of a work situation. 

Change may be desirable for a variety 
of reasons and may be initiated by sources 
other than management. Changes of this sort 
are not fortuitous but nonetheless occur and 
occasionally the change may be gradual and 
undetected by management, 

Part of the answer to the dilemma might 
have been obtained if the lead provided by 
Otis (1953) had been taken and actually had 
the researcher involved himself personally in 
the research situation. It may well be found 
that while the job duties, the products, the 
market, and the customers all remain con- 
stant, the objectives of the company change 
in as few as 1 or 2 years. The functional 
taxonomy approach to performance predic- 
tion as suggested by Stark (1959) would 
lead, in this case, to a cul-de-sac. The criti- 
calness of functions has changed, not the 
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functions. At a point in time, the objective 
of the company may be growth, and what 
is critical and important to success is the 
acquisition of new clients and accounts. At 
a later point in time with the achievement 
of the growth goals, the development of client 
accounts may be the most critical function, 
In this case, the criterion dimensions should 
be treated separately as advocated by Guion 
(1963). The example is an obvious planned 
change and obviously job-performance cri- 
teria will differ as the goals and objectives 
of the company change. There are, the author 
points out, more subtle differences which 
occur with the passage of time and which 
may cycle over a definite period as reflected 
by the Boyles et al. (1963) study. A manu- 
facturing corporation might place consider- 
able emphasis on union-management relations 
during the several months prior to contract 
negotiations, as far as first-line supervisors 
are concerned. During the balance of the 
contract life, performance on this dimension 
may not be considered critical. Thus, while 
the position functions remain constant, tests 
validated during the critical human-relations 
period might show a definite pattern of valid- 
ity coefficients, whereas when cross-validated 
during the uncritical periods, they may fall 
apart. Tests which fail to hold up under 
cross-validation may erroneously be dis- 
carded. The reverse is also true; the test 
battery validated during the human-relations 
uncritical period may result in the hiring of 
individuals whose performance as a group 
may be randomly related to human-relations 
performance during the period in which this 
type of performance is critical. 

The author suggests through these ex- 
amples that criteria of job performance are 
dynamic, not only during the learning and 
development period of a particular job, but 
also related to the dynamic nature of the 
functions of the organization. Jobs may have 
functions, the importance of which may wax 
and wane in fairly definite, predictable, 
cycles. He suggests that the importance of 
this consideration increases concomitantly 
with the level of abstractedness or intangible 
nature of the function being investigated. 
In longitudinal studies and long-duration 
criterion-maturity studies, this factor must 
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be accounted for by more precise measure- 
ment of organizational goals and objectives. 
Studies of the type described by Forehand 
and Gilmer (1964) are needed and further 
must be interpreted and extended to in- 
clude the criterion-measurement problem for 
selection or performance evaluation. 
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Most comparative studies of programs with conventional media have compared 
a linear program plus lecture condition with either a lecture-alone, program- 
alone, or text-alone. This design results in noncomparable treatment groups, 
since the experimental Ss may either be given more time to use the program 
or are exposed to the same material twice. The present study, utilizing a 
branching program, controls for these possible error factors. Based upon the 
performance of 66 undergraduate Ss, an analysis of covariance suggests that 
sheer repetition of material, regardless of the medium employed, is a significant 
factor influencing the outcome of comparative studies. 


In recent years, numerous studies have 
compared programmed instruction with vari- 
ous other instructional techniques. For present 
purposes, there are two points of interest in 
these investigations: (a) the methodology and 
(6) the program employed. 

From a methodological point of view, an 
experimental group taught by a program is 
typically compared with control groups in- 
structed by conventional methods. In gen- 
eral, most of these studies report no sig- 
nificant differences on criterion test perform- 
ance (Carpenter, Greenhill, Baker, & Levin, 
1963; Goldberg, Dawson, & Barrett, 1964; 
Hughes & McNamara, 1961). Among those 
studies where the comparisons were between 
groups exposed to a program plus lecture and 
either a lecture-alone, a program-alone, or text- 
alone, however, the program plus lecture group 
scored significantly higher on a criterion test 
than any of the comparison groups (Brown, 
1962; Hatch & Flint, 1962). 

An exception to this general methodological 
approach was a recent study reported by 
Ripple (1963). He compared a program-alone 
condition, a conventional reading condition, 
and a lecture condition. The most striking 
finding here was the significantly better per- 
formance of the program-alone group, than 
the reading group. The major difficulty in 
evaluating these data, however, is that there 
was a 2-day time lapse between the instruc- 
tional period and criterion test. Consequently, 
the superior performance of the program-alone 
group may have been attributable to uncon- 


trolled intervening factors, rather than to the 
type of instructional medium employed. 

In general, there are two major criticisms 
of most studies comparing a program plus 
lecture condition with either a lecture-alone, 
program-alone, or text-alone: (a) inadequate 
controls for time of presentation and (0) 
repetition factors. In the first instance, ex- 
perimental subjects (Ss) may be given more 
time in using the program than control Ss 
with other media, as in the Brown (1962), 
Goldberg et al. (1964), and McNeil (1964) 
investigations; in the second case, experi- 
mental Ss may be exposed to the same mate- 
rial twice, a situation which does not hold for 
control Ss. Both procedures result in non- 
comparable treatment groups which may not 
only account for the superiority of the ex- 
perimental group but, also, for the discrepant 
research findings. 

Furthermore, it is interesting to note that 
most studies in this area have incorporated a 
Skinnerian or linear program. The apparent 
lack of interest in the branching program 
seems unjustified, since the comparative studies 
have so far indicated no clear superiority for 
either format, at least in terms of criterion 
performance (Schramm, 1964). In addition, 
branching programs do have the advantage 
of making fewer assumptions about the 
learner; there is also the possibility that 
learned information is more readily general- 
ized. Such observations suggest that findings 
based on linear programs should be com- 
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parable to those based on branching programs; 
or, at least, should not differ significantly. 

The purpose of this present comparative 
study is twofold: (a) to replicate the study 
reported by Ripple using a branching pro- 
gram and an immediate criterion test, and 
(b) to determine the effects of presenting dif- 
ferent combinations of the instructional ma- 
terial—both comparisons being made with 
time and repetition factors controlled. If the 
hypothesis regarding the superiority of the 
programmed medium is valid, the group in- 
structed by this method should perform sig- 
nificantly better on the first criterion test 
than the other two groups exposed to either 
the lecture or text media. In addition, the 
groups instructed by the program plus another 
medium should perform significantly better 
on a second criterion test than groups in- 
structed by combined media not incorporating 
the program. 


MertTHOD 
Subjects 


There were 66 Ss, upperclass psychology majors 
enrolled in a Tests and Measurements course at a 
state university, who participated in this study. 


Materials 


A branching program on the WAIS, written by 
the senior author, was used in this investigation. 
The program consisted of 27 information frames 
with an average of three corrective branches for 
each. The program had been pretested on a com- 
parable S population during the previous semester 
and appropriately revised. Average time to complete 
the program was 25 minutes. 

The mimeographed text and the lecture material 
were drawn directly from the information frames of 
the program. In order to assure uniformity of 
presentation and also to control for the Hawthorne 
novelty effect (Schramm, 1964), at least in the com- 
parison of the lecture and program groups, the 30- 
minute lecture presentation was recorded on video 
tape. 

Three tests were administered during various 
phases of the experiment: Pretest (Pre), Criterion 
I (C-I), and Criterion II (C-II). C-I and C-II tests 
were administered after the first and second presenta- 
tions of the instructional material, respectively. All 
three tests consisted of 25 items, both multiple-choice 
and recall. These were drawn randomly from a 
total item pool composed independently by the 
authors and there was no item overlap, either among 
the three tests or between the multiple-choice and 
recall formats. 
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Procedure 


Single instruction medium. Three instructional con- 
ditions were used: program (P), televised lecture (L), 
and mimeographed text (T). The Ss were randomly 
separated into three groups of 22 each. Two of the 
three conditions were presented to each group during 
a regularly scheduled 2-hour class period. 

Upon reporting to the appropriate rooms, Ss were 
given the Pretest and then informed that they would 
be presented with course material in several forms. 
They were also told that they would be given the 
information twice, each time for 35 minutes, and that 
after each presentation, a 15-minute test would be 
administered. The Ss were allowed to make notes if 
they wished during each presentation. Appropriate 
motivational set was established by informing Ss 
that scores obtained on the criterion tests would be 
added into their final grade for the course. 

Combined instructional media. Following the C-I 
test, each group was divided in half. Each half was 
then presented with the instructional medium which 
not only differed from the original medium but dif- 
fered from the other half of the group, as well. 
This procedure yielded a counterbalanced treatment 
design, with 11 Ss randomly assigned to each sub- 
group. These six treatment conditions were: L+T, 
TLL, P-+T, T+P, L-P,) and’) PeSeaeimmede 
ately following the second presentation of material, 
the C-II test was administered. 


RESULTS 


Table 1 presents the means and standard 
deviations for the three single medium groups 


TABLE 1 


MEANS AND STANDARD DEVIATIONS FOR THE PRE, C-I, 
AND C-II TEsTs ACROSS THE SINGLE MEDIUM 
AND COMBINED INSTRUCTIONAL GROUPS 








Media M SD 
Single 
Pretest 
P 11.68 2.46 
ib, 12.54 2.99 
ay 12.77 tier 
C-I test 
Ip 1575 3.95 
1g 18.77 RES, 
2 17.86 3.91 
Combined 
C-IT test 
P+T 19.45 1S 
P+L 18.00 3.00 
L+T 20.00 2.32 
L+P 19.27 2.83 
T+L 19.10 1.81 
T+P 19.54 2.46 
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and six combined media groups for the Pre, 
C-I, and C-IT tests. 

To determine the relative effectiveness of 
the three single instructional techniques, an 
analysis of covariance was computed for the 
C-I test, using the Pretest scores as the 
covariate. These results are presented in 
Table 2. Significant treatment differences were 
found (F = 4.65, df = 2, p > .05). To assess 
where the significant treatment effects oc- 
curred, Duncan’s multiple-range test (Ed- 
wards, 1960) was computed. The studentized 
ranges (sr) showed that, while the L and T 
group performance did not significantly differ 
from each other (sr = 0.91, ms), both groups 
performed significantly better than the IP 
group. That is, the mean of the L group was 
much higher than that of the P group (sr = 
3.04, p < .01) and, likewise, the T group was 
superior to the P group (sr = 2.13, p < .05). 

It is clear from these data that, although 
there were significant treatment effects, such 
differences were not in favor of the P group. 
In fact, the P group performed more poorly 
than either the T or L groups; the results are, 
consequently, not in accord with the findings 
of Ripple (1963). 

In order to determine the effects of com- 
bined instructional media on the six sub- 
groups, an analysis of covariance was com- 
puted on the C-II test with the Pretest as the 
covariate. The results are reported in Table 3. 
As is clear from this table, the treatment 
mean square barely reached a value of 1.00 
and, as a consequence, analysis was terminated. 

Finally, since it was expected that the 
total group performance would progressively 
improve as a function of exposure to the 
material, one-tailed ¢ tests were calculated 
between the means of the Pre- and C-I tests, 
as well as the C-I and C-II tests. The ob- 


TABLE 2 


SUMMARY OF ANALYSIS OF COVARIANCE 
or C-I Test PERFORMANCE 











Source df | MS F 
Treatment Z | 41,14 4.65* 
Error 62 8.85 

Total 64 





*p <.01. 


TABLE 3 


SUMMARY OF ANALYSIS OF COVARIANCE 
or C-II Test PERFORMANCE 











gh a 
Source df MS | F 
ies . BE Beni |i. 3 i 
Treatment 5 | 5.00 | > 1.00 
Error 60 | 5.71 
Total 65 | | 


tained ¢’s were significant in both cases (¢ = 
10.24, df = 130, 2< .0005 and ¢= 3:56, dj 
= 130, p < .0005, respectively). 


DISCUSSION 


The failure to replicate Ripple’s findings 
may be due to one or more of three factors: 
(a) the use of a branching, rather than a 
linear, program, (0) the use of a poor pro- 
gram, or (c) the use of an immediate post- 
test, resulting in a better controlled study. 

The first possibility seems unlikely, since no 
clear-cut superiority has been shown in the 
few investigations reported, at least in terms 
of criterion performance (Schramm, 1964). 
The second alternative cannot be meaningfully 
assessed, although two points argue against 
its validity: (@) Ss do, in fact, learn from 
the program as indicated by the ¢ tests be- 
tween the pretest and the first and second 
criterion tests, and (b) furthermore, the pro- 
gram was revised several times on the basis 
of the performance of a comparable S group. 
Consequently, the extent to which these 
factors bias the data appears negligible. This — 
suggests that the programmed medium pos- 
sesses no especial superiority over other in- 
structional media, as concluded by Ripple 
and other investigators. Perhaps a more fruit- 
ful approach to this issue might be a concern 
with the types of instructional material most 
efficiently taught with the programmed me- 
dium. Related to this might be a concern for 
the teaching efficiency of a particular program 
format for a specific area of instruction. 

A second major question posed in this 
present investigation deals with the effects 
of repetition in instructional material, which 
is generally evident in program versus other 
media comparisons. The results from the com- 
parison of the combined media suggest that 
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sheer repetition of material, regardless of the 
medium employed, is a significant factor. 
Missing from this study, however, are three 
additional groups with each medium repeated 
(e.g., program-program, lecture-lecture, and 
text-text). There were simply not enough Ss 
available to fill these cells. A future study, 
employing these groups, particularly the pro- 
gram-program group, might yield a more con- 
clusive answer to the repetition question, 
especially if repetition alone were found to 
overcome the apparent initial handicap from 
using the program. Furthermore, a similarly 
designed study incorporating both linear and 
branching formats might shed further light on 
the comparability of the two. Regardless, 
forthcoming investigations not controlling for 
the repetition of material are subject to error 
in interpretation. 
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Multiple discriminant-function analysis disclosed that groups of students 
majoring in Occupational Therapy, Physical Therapy, Medical Technology, 
Nursing, and Education could be successfully distinguished from each other, 
on the basis of 29 scales of the Strong Vocational Interest Blank for Women 
(SVIB—W). Furthermore, 2 discriminant analyses using 11 scales also indi- 
cated successful discrimination. 2 discriminant functions were significant in 
each analysis and the configuration of the groups in the discriminant space 
and the efficiency of classification for all analyses were highly similar. It was 
concluded that the SVIB should be a useful instrument for discriminating 
between college majors when utilizing discriminant-function analysis. 


The Strong Vocational Interest Blank 
(SVIB) has been found to be an extremely 
useful instrument for vocational guidance 
purposes. Data on the SVIB for Women 
(—W) have been accumulated over the past 
3 or 4 years on University of Florida stu- 
dents enrolled in certain health and health 
related professions (HP), as part of a con- 
tinuing research program concerned with ex- 
amining personality, interest, and aptitude 
similarities and differences among students 
in Occupational Therapy (OT), Physical 
Therapy (PT), Medical Technology (MT), 
Nursing (N), and other students enrolled in 
curricula outside of the HP, such as Educa- 
tion (E). These continuing research activities 
conducted by the Regional Rehabilitation Re- 
search Institute at the University of Florida 
are described in a monograph by Dunteman, 
Barry, and Anderson (1966). 

The purpose of the present study was to 
determine if 29 scales of the SVIB could 
differentiate among groups of OT, PT, MT, 
N, and E students. Multiple linear discrimi- 
nant-function analysis was used to isolate the 
dimensions (discriminant functions) account- 
ing for differences among the five groups. 
The number of discriminant functions 


1 This research was supported by research grant 
RD-1127 from the Vocational Rehabilitation Admin- 
istration, Department of Health, Education and 
Welfare, Washington, D. C. Part of the data for this 
study was collected under National Institute of 
Mental Health Project Grant 380, the Public Mental 
Health Methods in a University. 


needed to explain the difference among these 
groups and the nature of the variables de- 
fining them can lead to an increase in our 
understanding about the differences and simi- 
larities among groups such as those with 
which the present study was concerned. The 
configuration of the five groups in the 
discriminant-function space can also suggest 
the nature of the similarities and differences 
among the groups in terms of interests as 
measured by the SVIB. To infer the prac- 
tical usefulness of the discriminant functions, 
equations will be developed to predict group 
membership from the SVIB scales. A clas- 
sification matrix can then be developed from 
these equations indicating how well these 
same students can be classified into the cor- 
rect groups on the basis of their test scores. 
Tatsuoka and Tiedeman (1954) have given 
an excellent review of discriminant analysis. 
The theoretical and computational aspects of 
discriminant analysis have been treated in 
great detail by Rao (1952) and Kendall 
(1957). 

A second purpose of the present study was 
to see if the number of SVIB scales used in 
the analysis could be substantially reduced 
without lowering the classification efficiency 
and changing the theoretical interpretation of 
the discriminant functions. For example, 
Dunteman (in press) found that the Occupa- 
tional Therapist, Laboratory Technician, and 
Nurse scales did a fairly good job of ‘dis- 
criminating among OT, MT, N, and a hetero- 
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geneous group of students (O) outside of 
the HP. Except for the O group, classifica- 
tion efficiency was about as good as Anderson 
and Barry (1965) obtained for a group of 
predominantly sophomore female students 
expressing interest in majoring in OT, PT, 
MT, or outside of the HP, using factor scores 
based upon 29 scales of the SVIB. However, 
the Dunteman study was primarily concerned 
with the validities of the Occupational Thera- 
pist, Laboratory Technician, and Nurse 
scales, while the present study was more 
interested in examining the complete SVIB 
profile. Anderson and Barry (1965) did not 
perform a discriminant-function analysis of 
their data but rather developed likelihood 
function equations in the test space for 
classification purposes. Consequently, their 
analysis gave little indication of the theo- 
retical nature of the differences among the 
groups. Also, they used intended major rather 
than actual major as their criterion of group 
membership. 


METHOD 


Standard scores for all but three scales of the 
SVIB were obtained for 41 MTs, 46 OTs, 27 PTs, 
61 Ns, and 25 Es. The Sister Teacher, Engineering, 
and Physical Therapist scales were not used because 
of a large number of missing observations. The 
MTs, OTs, PTs, and Es were either juniors, seniors, 
or graduates of their respective curriculum. The Ns 
were nursing students who had successfully com- 
pleted their sophomore year. 

The means and standard deviations for the five 
groups on the 29 scales were computed. The dif- 
ferences among the five groups on each scale were 
tested for significance by an F ratio. A discriminant- 
function analysis was then conducted on all five 
groups for all 29 scales. Wilks’ lambda (Cooley & 
Lohnes, 1962) was computed and tested for signifi- 
cance by a chi-square approximation in order to 
determine whether the vectors of means for the 
29 scales across the five groups were significantly 
different from each other. Each discriminant func- 
tion was tested for significance by a chi-square test 
(Bartlett, 1947). The centroids and dispersions of 
each group on each discriminant function were then 
computed. When the overall chi-square test indi- 
cated significance, an attempt was made to predict 
these same students’ major from their discriminant- 
function scores. That is, each person, treated as a 
point in the discriminant-function space, was pre- 
dicted to be a member of that group of which her 
point fell closest to the centroid. 

Since from the standpoint of practicality 29 scores 
are quite numerous, two approaches were taken for 
reducing the numbers of variables involved. The 
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first approach was to use in the discriminant- 
function analysis only those scales which had pro- 
duced F ratios significant beyond the .01 level of 
confidence. The second approach was to pick from 
the 29 scale discriminant-function analysis vari- 
ables which had relatively high weights on one or 
more of the significant discriminant functions. The 
number of variables selected in this manner was 
arbitrarily made equal to the number of variables 
selected in the first approach so that both of these 
two approaches could be compared to each other 
as well as to the original approach. If this analysis 
had only involved two groups, then test-selection 
techniques commonly used in multiple-regression 
analysis could be used to reduce the 29 variables 
to a manageable few that would account for the 
majority of the between-group variance on the 
single discriminant function. However, for the case 
of a discriminant analysis involving more than two 
groups, no optimizing technique has yet been de- 
veloped to reduce the number of variables. A 
factor analysis of the 29 scores would hardly be a 
solution since to obtain the factor scores, one would 
still have to use the 29 original test scores. 

It should be pointed out that neither one of the 
two approaches mentioned above is optimum in the 
sense that the variables selected will lead to a maxi- 
mum amount of separation among the groups for 
that number of variables. For example, variables 
that have low, insignificant F ratios frequently have 
high weights in a discriminant function, because they 
help determine the orientation of the hyperellipsoid 
in the test space due to the nature of their intercor- 
relations with other variables. The general situation 
is analogous to test selection in multiple correlation. 

The analyses for the last two approaches were 
conducted in exactly the same way as the original 
analysis except that there were less variables. The 
relative effectiveness of the three approaches can be 
inferred from the similarity of the configuration of 
the groups in the discriminant-function space for 
each of the three analyses. The efficiency of 
classification will also be compared across the three 
analyses. 


RESULTS 


Out of the 29 SVIB scales, 11 resulted in 
F ratios significant at less than the .01 
level.? Wilks’ lambda, which is a function of 
the roots of W-*A (W-' represents the inverse 
of the within-groups dispersion matrix and 
A, the among-groups dispersion matrix) and 
indicates the discriminating power of the test 

2A table of the means, standard deviations, and 
F ratios for the 29 SVIB scales has been deposited 
with the American Documentation Institute. Order 
Document No. 9012 from ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress, Washington, D. C. 20540. Remit in ad- 
vance $1.25 for microfilm or $1.25 for photocopies 


and make checks payable to: Chief, Photoduplication 
Service, Library of Congress. 
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battery, was computed and found to be 
significant at less than the .001 level. 

The first two roots whose size indicates the 
percentage of total discriminating power of 
the corresponding discriminant functions 
were significant beyond the .001 level while 
the last two were not significant at the .05 
level. The first two roots accounted for 
approximately 76% of the variation among 
groups. 

The group centroids and dispersions for 
each of the groups on the two significant 
discriminant functions are presented in Table 
1. Table 2 shows the discriminant-function 
weights scaled by multiplying each of the 
unscaled weights by the square root of the 
appropriate variance estimate for each vari- 
able. These scaled weights then indicate the 
relative contribution of each SVIB scale in 
determining the discriminant score. 

The centroids of the five groups on the 
two largest discriminant functions are shown 
graphically in Figure 1. Since the first two 
discriminant functions accounted for a major 
proportion of the variation of the five groups, 
they should approximate fairly well the con- 
figuration of the groups in the original 29- 
variable space. 


Sil 


The classification matrix is shown in 
Table 3. This matrix indicates the number of 
people who were correctly and incorrectly 
predicted to be members of the respective 
groups. The rows of the matrix represent pre- 
dicted group membership while the columns 
represent actual group membership. 

For the second analysis, the 11 variables 
with F ratios significant beyond the .01 level 
were included. Wilks’ lambda was significant 
beyond the .001 level of confidence. The first 
two discriminant functions were significant 
beyond the .001 level and together accounted 
for approximately 92% of the variation 
among groups. The group centroids and dis- 
persions are presented in Table 1 and the 
scaled discriminant weights are shown in 
Table 2. The centroids for the five groups 
on the first two functions are presented in 
Figure 1, and the classification matrix is 
shown in Table 3. 

For the third analysis, the 11 variables 
were chosen which had the highest relative 
weights on either one or both of the discrimi- 
nant functions isolated in the 29-variable 
analysis. Again, Wilks’ lambda was signifi- 
cant beyond the .001 level of confidence 
along with the first two discriminant func- 


TABLE 1 


Group CENTROIS AND DISPERSIONS FOR THE THREE ANALYSES 








Centroids Dispersions 
Group 
Function I Function II Function I Function IT 

Analysis 1 

Medical technology —728.64 32.53 23.43 33.67 

Occupational therapy —21.31 41.68 23.47 25.89 

Physical therapy —22.71 34.67 29.17 51.24 

Nursing —17.98 33.18 24.77 26.26 

Education —23.33 39.54 17.14 22.08 
Analysis 2 

Medical technology —17.62 18.33 46.57 76.33 

Occupational therapy —7.01 26.72 40.94 93.49 

Physical therapy —9.49 17.27 47.86 56.97 

Nursing —5.20 15.60 29.23 51.08 

Education —9.52 23.82 42.38 65.77 
Analysis 3 

Medical technology —23.60 44.04 54.43 64.14 

Occupational therapy —9.65 onao 64.50 67.63 

Physical therapy — 14.66 44,34 48.56 53.35 

Nursing —9.07 41.02 39.74 35.31 

Education —13.02 50.68 56.26 55.39 
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TABLE 2 


SCALED DISCRIMINANT-FUNCTION WEIGHTS FOR THE 3 ANALYSES 

















Analysis 1 Analysis 2 Analysis 3 
Scale : ; : _ : ; 
Function | Function | Function | Function | Function | Function 
I II I leh I II 
Occupational Therapist .66 92 EOI A8 43 il 
Laboratory Technician —.97 — 48 —1.00 —.09 —1.00 —.09 
Housewife 06 04 — — a — 
Stenographer-Secretary —.72 18 — — —.14 .90 
Physician —.58 32 —.16 et — —_— 
Social Worker — .03 1.00 —.16 714 all BT 
Artist oD mC) — — 10 1.00 
Author —.07 05 — — — ad 
Business Education Teacher ol — 46 — _ — —_ 
Buyer —.01 ne — aa = a 
Dentist —.05 02 .03 07 -- _— 
Dietitian —.05 — 36 — — = = 
Elementary Teacher — 40 —.16 — = a ae 
English Teacher 47 — 34 BLS —.25 — — 
Home Economics Teacher —.01 .66 — —_— .09 30 
Life Insurance Saleswoman —.08 0 —_— — a eo 
Lawyer 02 .05 — as ae — 
Librarian — .26 —.07 = oo — —_— 
Math-Science Teacher —.26 28 — 33 18 — —_— 
Music Performer .09 — .63 ld! —.13 .06 —.57 
Music Teacher —.40 .86 —.25 28 —.08 49 
Nurse 1.00 —.75 pul —1.00 poi —.87 
Office Worker —.14 48 — -- — “= 
Physical Education Teacher College oS = 12 — — —.A1 10 
Physical Education Teacher High School 17 — .04 ao — — — 
Psychologist — 42 .69 — — —.08 60 
Social Science Teacher 04 .08 oa — — — 
YWCA Secretary oll —.20 — —_ —— ca 
Femininity-Masculinity —.15 .05 01 Lal — — 





tions. The first two discriminant functions 
together accounted for approximately 89% 
of the variation among groups. The group 
centroids and dispersions are presented in 


Table 1 and the scaled discriminant weights 
are shown in Table 2. The centroids for the 
five groups are presented in Figure 1, and the 
classification matrix is shown in Table 3. 


TABLE 3 
CLASSIFICATION MATRIX FOR ACTUAL AND PREDICTED GRrouP MEMBERSHIP FOR THE 3 ANALYSES 








Actual group 


Medical technology 
Occupational therapy 
Physical therapy 
Nursing 

Education 








Analysis 1 


Predicted group 





Analysis 2 


Predicted group 


6 4 
27 2 

3 7 

SY | IO) |) 2 
8 3 


Analysis 3 


Predicted group 


30 5) aes 1 2 
Zale Oa 3 7 
5 Ong 4} 3 
3) 10) |) LOR aa 1 
5 D8 | 8S 2 8 
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Analysis 1 


Discriminant 
Function IL 


Discriminant 
Function I 








Analysis 2 
0 
26 Did 
a ry e E 
gr 22 
a6 
aA xd 
He o 
1 m 
oN Discriminant 
Function T 
-16 -12 -8 
Analysis 3 
ws 
gH « OT 
Z g az 
ot 5 
Haw 
aE se 
Am 
44 ie MT e PT 


N Discriminant 


gg ey ee Runc et ont’ 
-20 -16 -12 -8 


Fic. 1. Centroids of the five groups in the discrimi- 
nant-function space for the three analyses. 


DISCUSSION 


In the 29-variable analysis, it can be seen 
from Figure 1 that the MT group scored 
lowest on both functions while the N group 
scored highest on Function 1 and the OT 
group scored highest on Function 2. The first 
function contrasts the MT with the N group, 
with the PT, E, and OT groups lying some- 
where in between. The second function re- 
sulted in the separation of two clusters; the 
OTs and Es being separated from the MTs, 
PTs, and Ns. To interpret the nature of each 
of the two functions, it is necessary to look 
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at the characteristics of the variables that 
define them. From Table 2, it can be seen 
that the Nurse scale and Occupational Thera- 
pist scale have the highest positive scaled 
weights for determining the discriminant 
score on Function 1 while the Laboratory 
Technician, Stenographer-Secretary, Physical 
Education College Teacher, and Physician 
scales have the highest negative weights. This 
dimension is interpreted as representing sup- 
portive interests versus nonsupportive in- 
terests. Both the Occupational Therapist scale 
and the Nurse scale presumably indicate in- 
terest in helping other people, while the scales 
with negative weights indicate interests of a 
more introverted, task-oriented nature. 

This seems reasonable when looking at the 
means of the groups on the first discriminant 
function. The Ns who could be considered 
as most supportive have the highest mean 
while the MTs who can be considered most 
task oriented and introverted have the lowest 
mean. The remaining three groups which can 
be considered as supportive or interpersonally 
oriented fall in between these two extremes. 
Holland (1959) would classify the N, PT, 
OT, and E groups as supportive professions 
and would classify MT as an intellectual 
profession. 

The second discriminant function is defined 
by the high positive weights of the Social 
Worker, Occupational Therapist, Secretary- 
Stenographer, Music Teacher, Artist, Psy- 
chologist, and Home Economics Teacher 
scales. The highest negative weights for this 
function are for the Nurse, Music Per- 
former, Laboratory Technician, and Business 
Education Teacher scales. 

The scales with the highest positive load- 
ings seem to define professions that are rela- 
tively unstructured and abstract while the 
scales with the highest negative loadings seem 
to involve professions that are fairly highly 
structured and concrete. Many of the scales 
with the highest positive loadings involve 
professions that are concerned with rt, 
music, and various forms of craftsmanship. 
The second discriminant function tends to de- 
fine interest in abstract-unstructured job situ- 
ations versus interest in concrete-structured 
job situations. 

Both Es and OTs who have the highest 
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means on Discriminant Function 2 are in- 
volved in an unstructured teaching situation 
utilizing arts, crafts, and music as the media 
of communication. Part of the role of OTs 
is to teach patients various arts and crafts 
while the role of the Es is similar in that, in 
a large number of instances, the Es are par- 
tially involved in teaching arts and crafts to 
elementary children. It should be pointed 
out that a large percentage of the Es in the 
present study were majoring in elementary 
education. 

PTs, Ns, and especially MTs function 
under highly structured role definitions. PTs 
and Ns carry out physicians’ prescriptions 
for various treatments, while MTs cannot 
deviate very much from prescribed laboratory 
tests for specified illnesses. On the other 
hand, the OTs and Es function in a more un- 
structured and ambiguous teaching situation. 

Classification was performed using the 
total discriminant-function space. A chi- 
square was computed for each individual for 
each group and the person was predicted to 
be a member of that group which yielded 
the lowest chi-square. The chi-square was a 
measure of the closeness of a person’s test 
vector (a point in the discriminant-function 
space) to the centroid of a particular group 
in the discriminant-function space (Cooley & 
Lohnes, 1962). 

As can be seen from the principal diagonal 
of the classification matrix for the 29-variable 
problem, group membership could be fairly 
well predicted on the basis of the 29 SVIB 
scales. There were 108 correct classifications 
out of a possible 200. This is an encouraging 
finding for groups that are extrinsically so 
similar. The finding also testifies to the valid- 
ity of the SVIB. The Occupational Therapist, 
Laboratory Technician, Nurse, and various 
of the Teacher scales had the highest weights 
on the two significant discriminant functions. 
In fact, the Nurse and Laboratory Tech- 
nician scales, respectively, had the highest 
weights on the first discriminant function, 
while the Occupational Therapist scale had 
the second highest weight on the second dis- 
criminant function. This finding indicates the 
power of the scales in that with all these 
29 scales and their complex interrelations, the 
most relevant scales in terms of this par- 
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ticular situation come out with the highest 
weights. The relatively high number of cor- 
rect classifications is even more encouraging 
when considering that over half of the stu- 
dents in these groups had not even gradu- 
ated, let alone worked in the field of their 
choice. However, it should be noted that the 
classification matrices presented in Table 3 
are based upon predicting group membership 
for the same sample on which the discrimi- 
nant equations were derived. Consequently, 
there has been some capitalization on error 
and the frequencies in the diagonal cells are 
higher than would be obtained if predictions 
were made for an independent sample. 

The usefulness of the SVIB is certainly 
highlighted by discriminant-function analysis. 
If univariate analyses alone were conducted 
for each scale, the investigator could not test 
the hypothesis of overall differences among 
groups. Separate tests for each scale would 
not give an overall picture of group differ- 
ences since the scales are intercorrelated and 
consequently the probability levels for each 
scale are not independent of one another. 
The fact that a number of scales are sta- 
tistically significant at a given level does not 
mean that the overall differences among the 
groups on all scales are significant or that the 
differences are practical in the sense that 
group prediction equations can be developed. 
Using the discriminant-analysis approach with 
the SVIB should be useful for developing 
group prediction equations for other types of 
college majors as well. Discriminant analysis 
should yield more information about the SVIB 
than has been yielded in the past by the more 
traditional types of analyses. 

Both of the variabie reduction procedures 
result in about the same amount of efficiency 
as the 29-variable procedure. The weights of 
the variables that are common to two or more 
of the three analyses are generally of the 
same sign and magnitude. From Figure 1, 
it can be seen that the configuration of the 
five groups in the space of the two largest 
discriminant functions is virtually identical 
for all three analyses; and furthermore, the 
classification efficiency for the three analyses 
based upon group predictions is basically 
the same. 

One reason why the short-cut methods are 
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just about as effective as the original method 
is that in all three analyses, the most rele- 
vant and highly weighted scales were re- 
tained. These were the Occupational Thera- 
-pist, Laboratory Technician, and Nurse 
scales. If there were no scales corresponding 
to the groups to be discriminated, then these 
two methods of reduction might not have 
worked so well. For example, since the 
Physical Therapist scale was missing, the 29- 
variable analysis did a relatively better job 
of classification for the PT group than the 
two shorter methods. This was primarily due 
to the fact that more information was needed 
to predict PT membership than would have 
been needed if a valid Physical Therapist scale 
was included in the two shorter analyses. 
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USE OF THE SEMANTIC DIFFERENTIAL IN A TEST 
OF SUPER’S VOCATIONAL ADJUSTMENT THEORY? 


ALLEN J. SCHUH 2 


Berkeley, California 


3 hypotheses derived from the theory that vocational adjustment is dependent 
upon implementation of the self-concept were tested: (a) the same dimensions 
of meaning are attributable to the self- and job-related concepts, (b) Ss will 
rate the concepts in the same way across the dimensions, (c) the self-concept 
is stable over time. 89 June graduating seniors from the University of Cali- 
fornia, Berkeley, were used as Ss for establishing the semantic structure of 
the Myself concept. This structure was then compared to the concepts 
Myself, My Job, and Employer, administered after Ss were employed. 40 
January graduating seniors were used for cross-validation. Hypotheses a and b 
were partially rejected. Hypothesis c was accepted at the .01 level. The theory 
that job satisfaction and life adjustment are due to a general evaluative 
personality disposition is offered as a substitute for the congruency theory. 


The overall purpose of this research effort 
is to explore the potential usefulness in voca- 
tional guidance of a semantic differential 
measurement of the self-concept. The second 
objective is to assess the stability of the self- 
concept over time. Despite the fact that sev- 
eral theories relate the self-concept to phe- 
nomena in counseling and psychotherapy, 
there is as yet little evidence concerning its 
stability. The third objective is to assess the 
extent to which semantic differential ratings 
on job-related concepts are predictable from 
knowledge of semantic differential ratings 
on the self-concept. The second and third 
objectives rest on assumptions derived 
from Super’s (1953) theory of vocational 
adjustment. 

Super offers a theory of vocational adjust- 
ment in terms of the role that the job and its 
related aspects allow the individual to play: 


This is the theory that satisfaction in one’s work 
and on one’s job depends on the extent to which 
the work, the job, and the way of life that goes 


1 This study was carried out under the financial 
assistance of the Western College Placement Asso- 
ciation’s annual Vera Christie Fellowship Award 
for 1963-1964. The Computer Center, University of 
California, Berkeley, generously supplied program- 
ming assistance and free computer time. 

?Now with the Aviation Psychology Division, 
Naval Aerospace Medical Institute, Pensacola, Florida, 
32512. Opinions or conclusions contained in this re- 
port are those of the author. They are not to be 
construed as necessarily reflecting the views or the 
endorsement of the Navy Department. 


with them, enable one to play the kind of role 
that one wants to play. It is, again, the theory 
that vocational development is the development of 
a self concept, that the process of vocational ad- 
justment is the process of implementing a self con- 
cept, and that the degree of satisfaction attained 
is proportionate to the degree to which the self 
concept has been implemented [p. 189]. 


There are certain implicit assumptions that 
follow from Super’s (1953) theory. First, 
that the self-concept is stable. Second, that 
the connotative structure of the self-concept 
is the same as the connotative structure of 
the job-related concepts. Third, that job per- 
ception is the dependent variable to self- 
conception and job satisfaction is the de- 
pendent variable to both job perception and 
self-conception. ; 

A careful review of the literature indicates 
that there have been no direct attempts at 
testing Super’s (1953) theory. It is felt that 
to test adequately the theory, three separate 
deductions require combination and substan- 
tiation. First, a test-retest reliability coef- 
ficient for the measure of the self-concept is 
necessary as proof of stability. Second, a 
comparison of the multidimensional structure 
of the job and self-concepts is necessary to 
demonstrate that they share the same con- 
notative structure. Third, a comparison of 
the subjects’ (Ss’) ratings of the self- and job 
concepts on the dimensions is necessary to 
evaluate whether the dimensions of meaning 
are being used for implementation. | 


516 


SEMANTIC DIFFERENTIAL AND VOCATIONAL ADJUSTMENT 


The present study, with two separate 
groups of graduating college seniors, attempts 
to test and cross-validate the . following 
hypotheses relevant to Super’s theory: 


1. The Ss will attribute the same dimen- 
sions of meaning to both the self- and job 
concepts. 

2. The Ss will rate both self- and job con- 
cepts in the same way on the dimensions of 
meaning. 

3. The self-concept will remain stable over 
a period of time during which Ss discontinue 
their education and take full-time positions 
in their chosen occupations. 


METHOD 


Subjects. Two groups of Ss were used. Group 
January represented 40 January 1964 graduating 
seniors of the University of California, Berkeley, 
who were randomly selected for contact from the 
Alumni Association’s Senior List. Group June repre- 
sented 89 June 1964 graduating seniors of the Uni- 
versity of California, Berkeley, who had volunteered 
for the study at the campus Placement Center. 

Concepts. Three concepts, Myself, My Employer, 
and My Job were rated by Ss. Three additional 
concepts were rated during the course of the study 
but will not be reported further here, they were: 
My Faculty Advisor, My Role as a Student, and 
The Way I Would Most Want To Be. 

Instrument. There were 15 items which were 
chosen as representative of five connotative dimen- 
sions isolated in prior factor-analytic studies of the 
semantic differential (Osgood, Suci, & Tannenbaum, 
1957, pp. 62-69). 

The test booklet was arranged in such a way that 
each concept appeared either at the top or the 
middle of the page and was followed by the 15 scales. 
The first 5 scales below each concept were the 
first items for each of the dimensions shown in 
Table 1, the next 5 were the second items, and 
the last 5 items were the third items from each 
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dimension. The scales were alternated in polarity 
direction to prevent the formation of position pref- 
erences. Following Osgood et al., each pair of ad- 
jectives was separated by a line divided into seven 
segments. 

Instructions. Both questionnaires included a set 
of instructions fashioned after those recommended 
by Osgood et al. (1957). 

Procedure. The Ss were selected according to the 
following scheme. For Group January, every fourth 
name card was drawn from the List for a random 
sample. Of the 201 Ss selected and contacted, 71 
answered both questionnaires. Of the 71 Ss, 40 were 
employed in full-time positions at the time of the 
second contact and thus, constitute Group January. 
The 31 excluded from the sample were in graduate 
school, housewives, or were entering the Peace 
Corps. It was felt that these Ss had not yet entered 
their career occupations and, that this was necessary 
for the purposes of the study. 

Of the June 1964 graduating seniors from the 
University of California, Berkeley, 229 volunteered 
for the study at the campus Placement Center. 
Of the 114 who answered both questionnaires, 89 
were full-time employees at the time of the second 
contact and, therefore, constitute Group June. The 
25 excluded from the sample were either in gradu- 
ate school, housewives, or entering the Peace Corps. 
Because of its larger N, Group June was. chosen 
as the experimental group and Group January was 
held out for cross-validation. 

The first questionnaire was mailed to Ss within 10 
days of their graduation. The Ss were assigned code 
numbers and were instructed not to place their 
names or other means of identification on their 
forms. The Ss were given no indication that a 
second questionnaire followed at a later date. The 
Myself concept (first administration) and questions 
regarding age, sex, marital status, and college major 
were asked on the first questionnaire. The second 
questionnaire included a question regarding S’s pres- 
ent employment status in addition to the concepts 
Myself (second administration), My Job, and My 
Employer. Group June was sent the second question- 
naire 3 months (ie. in August) after the first 
contact and Group January received the follow-up 
questionnaire 4 months later (ie., in May). 


TABLE 1 


FIFTEEN ITEMS CHOSEN AS REPRESENTATIVE OF FIvE DIMENSIONS ISOLATED IN PRIOR 
STUDIES WITH THE SEMANTIC DIFFERENTIAL 














Scale 
Dimension 

1 2 3 
Evaluative good-bad complete-incomplete sincere-insincere 
Potency hard-soft strong-weak constrained-free 
Activity active-passive hot-cold fast-slow 
Receptivity sensitive-insensitive interesting-boring refreshed-weary- 
Aggressiveness aggressive-defensive rash-cautious fluttering-steady 





518 


Means, standard deviations, and Pearson product- 
moment correlation coefficients were computed be- 
tween the scales for each concept separately. A 
preliminary Cumulative Communality Key Cluster 
Analysis (Tryon, 1958) was performed on the cor- 
relation matrix of the June Group’s August-ques- 
tionnaire Myself concept. The orthogonal axes were 
rotated to oblique structure. Factor coefficients for 
all variables on all dimensions within the concept 
were computed. All analyses were conducted with 
programs from the BC TRY and STATPAK 
systems on an IBM 7094 computer. 

The oblique dimensional structure evolving from 
the preliminary analysis was then preset and used 
to define the dimensions for all eight concepts. 
To preset a dimensional structure, the experimenter 
(EZ) selects the defining variables for each dimension 
and the number of dimensions (in this case E used 
the results of the June-August Myself analysis) 
and punches these in the control cards of the 
cluster-analysis program. An inspection of the 
domain validities after oblique rotation gives E an 
indication of the accuracy of the factor estimates. 
The statistic Tryon (1957) calls “domain validity” 
is the same statistic others have called the “index of 
reliability” (compare Tryon, 1957, p. 237 with 
Guilford, 1954, pp. 351-352). It is the correlation 
between a sample and its perfect criterion measure 
on the property X, as operationally defined. 

A comparative analysis of the dimensional struc- 
ture of the eight preset cluster analyses after oblique 
rotation was then obtained by a method developed 
by Tryon (1964). The pattern of factor coefficients 
over the items on the dimensions in each concept, 
is the basis of the comparison. Thus, the oblique 
factor matrices for the concepts are integrated into 
one total matrix. The similarity of each pair of 
dimensions is computed and a similarity matrix is 
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developed. The similarity of the dimensions is as- 
sessed by the cosine of angular separation from their 
central angle. It should be noted that it is the 
absolute and not the signed value of the cosine that 
is important. The significance of the cosine’s devia- 
tion from zero was tested by a two-tailed z test 
for the June Group and by a two-tailed ¢ test for 
the January Group. The .05 level was chosen as 
criteria for significance. 

The raw scores for the items composing each of 
the dimensions were summed algebraically to develop 
simple-sum cluster scores. These cluster scores were 
then intercorrelated to determine the extent to 
which Ss rated the self- and job-related concepts in 
the same way on the dimensions of meaning. The 
significance of the deviation of the correlations from 
zero was assessed by a two-tailed ¢ test for the 
January Group and a two-tailed z test for the 
June Group. The .05 level was chosen as criteria 
for significance. 


RESULTS 


The biographical characteristics of the 
groups are as follows: The average S in 
Group January was 23 years old, evenly 
divided on male-female ratio, two thirds were 
majoring in nontechnical fields of study and 
two thirds were not married. The average S 
in Group June was 22 years old, evenly 
divided on male-female ratio, three fourths 
were in nontechnical major fields of study, 
and three fourths were not married. 

The preliminary cluster analysis for the 
Myself concept from the June Group’s 
August questionnaire revealed that all 15 


TABLE 2 


OBLIQUE FACTOR COEFFICIENTS ON THE DIMENSIONS OF THE CONCEPTS MySELF (FIRST AND 
SECOND ADMINISTRATION), My Empitover, AND My Jos For GRouP JUNE 








Concept 





June-August Myself 


June-August Employer 


Hasti- 
ness 


June-August My Job June-June Myself 


Scale 

Evalua- | Aggres- | Hasti- | Evalua- | Aggres- 

tive siveness ness tive siveness 
good -50 —.10 —.23 61 —.50 
hard .26 .70 —.07 —.15 -83 
active roo 00 38 49 .00 
sensitive dl —.57 —.02 49 —.42 
aggressive 153 .66 .20 36 so 
complete .o7 .27 rp | 74 —.22 
strong md .39 14 .60 Al 
hot 16 —.42 .00 aos —.73 
interesting .66 .37 —.08 .73 —.13 
rash —.29 —.02 46 OL —.09 
sincere 49 —.06 —,06 552: —.63 
constrained —.61 —.24 —.24 —.42 38 
fast ro -08 49 44 ea 
refreshed PoP .06 —.20 52 —.21 
fluttering —.71 —.21 so —.48 —.14 

he 61 “25 14 49 


Evalua- | Aggres- | Hasti- | Evalua-| Aggres-| Hasti- 
tive siveness ness tive siveness ness 

15 85 .57 .60 44 -03 —.05 
20 34 41 .29 .19 —.72 .20 
52 io 79 74 :65 —.07 59 
ae .57 .69 BON —.11 .49 27 
52 64 othe -40 -50 —.o2 49 
04 52 A5 wae 56 —.26 27 
24 74 «74 -59 OG —.39 -28 
25 on 1 com 18 Zi son 
27 -81 -62 64 .33 —.06 .37 
50 Fi) 18 49 —.01 —.05 -43 
07 Now ea aia 54 .04 “15 
49 —.42 =.33 —.72 —,.56 «hd —.69 
65 ate -52 ove 58 .00 -62 
42 .79 59 75 .54 —.17 36 
23 —.26 —.04 15 —.54 44 14 
20 69 13 18 54 21 25 
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TABLE 3 


OBLIQUE FACTOR COEFFICIENTS ON THE DIMENSIONS OF THE CONCEPTS MYSELF (FIRST AND 
SECOND ADMINISTRATION), My EmpLoyvEer, AND My Jos For Group JANUARY 


January—May Myself 


Scale 

Evalua-| Aggres-| Hasti- | Evalua-| Aggres- 

tive siveness ness tive siveness 
good 75 me —.44 eBS .00 
hard .16 -42 .04 —.04 Oe, 
active 48 135 532 .76 63 
sensitive —.08 —.50 16 -20 —.35 
aggressive aon -66 —.55 .50 74 
complete .67 -03 —.60 73 30 
strong 34 28 -03 66 58 
hot oo —.06 —.42 59 49 
interesting 5oD —.05 —.39 69 .38 
rash —.27 .09 il —.11 aS) 
sincere 56 —.18 —.28 14 —.33 
constrained —,.52 —.07 .50 —.64 —.24 
fast DL 12 —.65 51 34 
refreshed -62 ads) —.22 -66 34 
fluttering —.45 .07 31 -62 —.07 
he -62 aod oil 58 -20 


January—May Employer 


Concept 


January—May My Job January—January Myself 


Hasti- | Evalua-| Aggres- | Hasti- | Evalua-| Aggres- | Hasti- 
ness tive siveness ness tive siveness ness 
mel 87 59 .54 .67 —.16 —.40 
42 56 nie —.02 19 44 —.07 
91 63 we w20 aah .08 —.27 
A5 65 ele 14 .20 —.70 —.02 
.62 65 al 722 oop 64 —.22 
see AT .70 Pil Boi —.30 —.14 
33 mS .80 .32 37 nail) —.29 
nS .39 55 23 30 —.28 04 
.40 78 44 81 182) aS —.29 
19 .09 .04 41 —.24 —.19 48 
—.12 .67 .63 46 -42 —.28 —.52 
—.75 —.69 —.50 —.97 —.30 —.21 .22 
59 41 14 44 30 —.03 —.58 
not .84 59 39 .64 .04 —.03 
.05 —.26 —.27 .20 —.10 .09 —.04 
Le -67 14 .19 50 34 16 





items could be accounted for by three dimen- 
sions. The first dimension was named evalu- 
ative and was represented by the scales: 
good-bad, active-passive, complete-incom- 
plete, strong-weak, interesting-boring, sincere- 
insincere, free-constrained, refreshed-weary, 
and steady-fluttering. The second dimension 
was named aggressiveness and was repre- 
sented by the scales: hard-soft, insensitive- 
sensitive, aggressive-defensive, and cold-hot. 
The third dimension was named _hastiness 
and was represented by the doublet: rash- 
cautious and fast-slow. 

The oblique factor coefficients from the 
eight preset cluster analyses are presented in 


TABLE 4 


Accuracy or Factor Estimates (“DOMAIN 
VALIDITIES” OF CLUSTER SCORES) 








Dimension 
peceet Evalu- | Aggres- | Hasti- 

ative |siveness| ness 
June-August Myself 92 84 63 
June-August Employer 93 85 Si 
June-August My Job 95 88 76 
June-June Myself 89 19 69 
January-May Myself 89 710 74 
January-May Employer 9A 85 54 
January—-May My Job .98 89 58 
January—January Myself 84 19 67 





Tables 2 and 3, along with the percentage 
of converged communality accounted for by 
each dimension. 

Table 4 shows the accuracy of the factor 
estimates as assessed from their domain valid- 
ities. All factor estimates exceed .50 and are 
considered satisfactory. 


TABLE 5 


DEGREE OF CORRELATION BETWEEN THE DIMEN- 
SION STRUCTURES ACROSS CONCEPTS 





June-August Myself* 





Concept 3 
Evalu- | Aggres- | Hasti- 
ative |siveness] ness 
June-August Employer .64** oom Ou 
June-August My Job OO” 14 .24* 
June-June Myself BD tee | OO Re 39** 


January—May Myself> 


Evalu- | Aggres- | Hasti- 

ative |siveness| ness 
January—May Employer | .64** A48** | —.33* 
January—May My Job G22" 19 — .33* 
January—January Myself | .71** Une A8** 


« A two-tailed z test was used to test the significance of these 
coefficients from zero. ; 
b A two-tailed ¢ test was used to test the significance of these 
coefficients from zero. 
*p <.05. 
D> < 01. 
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TABLE 6 


DEGREE OF CORRELATION BETWEEN THE RAW 
CLUSTER SCORES ACROSS CONCEPTS 





June-August Myself* 





Concept 
Evalu- | Aggres- | Hasti- 
ative |siveness} ness 
June-August Employer on —.07 Jaf 
June-August My Job A2** 03 alll 
June-June Myself .69** nO a4 SD Onn 


January—May Myself 


Evalu- | Aggres- | Hasti- 
ative | siveness| ness 
January—May Employer | .38* .00 —.17 
January—May My Job .60** oe als) 
January—January Myself | .59** .68** 102% 





® A two-tailed z test was used to test the significance of these 
coefficients from zero. 
> A two-tailed ¢ test was used to test the significance of these 
coefficients from zero. 
*b < 05. 
thas Olts 


The similarities among the dimension struc- 
tures as assessed by Tryon’s (1964) method 
are presented in Table 5. The ability of the 
preset June Group’s August-questionnaire 
Myself dimensional structure to compare with 
that of the job-related concepts is evident. 
Table 5 shows clearly that the same dimen- 
sions of meaning are used to describe the 
self- and job-related concepts with the excep- 
tion that the aggressiveness dimension is 
not used to describe the My Job concept. 

The correlations between the raw cluster 
scores are presented in Table 6. These cor- 
relations indicate the extent to which Ss rate 
the concepts in the same way over the dimen- 
sions of meaning. Table 6 shows that the 
concepts are rated the same way only on the 
evaluative dimension with the exception that 
both Myself concepts are rated the same way 
on all three dimensions. 


DISCUSSION 


The first hypothesis, that Ss would at- 
tribute the same dimensions of meaning to 
both the self- and job concepts, is partially 
rejected. The same dimensions of meaning 
are used to describe the self- and job-related 
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concepts with the exception that the aggres- 
siveness dimension is not used to describe 
the My Job concept. 

The second hypothesis, that Ss would rate 
the self- and job-related concepts in the same 
way across the dimensions of meaning, is 
partially rejected. The Ss rated both adminis- 
trations of the Myself concept the same way 
across all three dimensions. The Ss rated the 
Employer and My Job concepts the same 
way they rated the Myself concept on the 
evaluative dimension. The Ss did not rate 
the Employer and My Job concepts the same 
way they rated the Myself concept on the 
aggressiveness and hastiness dimensions. 

The third hypothesis, that the self-concept 
would remain stable over a period of time 
during which Ss discontinued their educa- 
tions and took full-time positions in their 
chosen occupations, is accepted. It is con- 
cluded that the self-concept is both stable 
in meaning and in the way Ss rated them- 
selves over the dimensions over a 4-month 
period of time. 

Failure of the aggressiveness dimension to 
appear for the My Job concept can proba- 
bly best be explained in terms of stimulus 
generalization. For the June Group, in 
Table 5, the highest absolute degree of simi- 
larity to that Group’s August-questionnaire 
Myself concept was noted to have been that 
Group’s June-questionnaire Myself concept. 
The next closest relationships were the Em- 
ployer and My Job concepts, respectively. 
This trend was also true for the cross- 
validation group. In each case the greatest 
similarity was found between the Myself 
concepts, then between the Myself and Em- 
ployer, and the least degree of similarity was 
found between the Myself and My Job con- 
cepts. It appeared as though the concept 
Employer had a greater amount of meaning 
in common with the Myself concept than did 
the My Job concept. Since both the Myself 
and Employer are animistic concepts, and the 
My Job concept is inanimate, it appears 
that there is greater similarity between two 
animistic concepts than between an animistic 
and an inanimate concept. In other words, 
some basic elements of similarity between 
the concepts probably accounts for the same 
dimensions of meaning being attributable to 
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both the Myself and Employer concepts. 
Also, some basic differences exist between 
the concepts Myself and Employer; and the 
concept My Job. Stimulus generalization 
may be at the source of these apparent 
similarities and differences. 

It was noted that the coefficients of simi- 
larity in Table 5 were typically not as high 
with the aggressiveness and hastiness dimen- 
sions as with the evaluative dimension. There 
are two possible explanations for this. The 
first is that the evaluative dimension is a 
more pervasive dimension of meaning than 
the aggressiveness or hastiness dimensions. 
The second possible explanation is that the 
greater number of items representing the 
evaluative dimension made it more stable. 
The first explanation seems more plausible. 
While the second explanation would account 
for the greater degree of internal consistency 
and the greater degree of converged com- 
munality attributed to the evaluative dimen- 
sion, it cannot explain why it was the ag- 
gressiveness dimension on the My Job 
concept that did not relate significantly to 
the Myself concept, since the hastiness di- 
mension was composed of only two scales 
and yet it did appear. Thus, these results 
cannot be explained in terms of the number 
of scales composing the dimensions. It is 
therefore felt that proposing stimulus gen- 
eralization as an explanation for the greater 
degree of similarity attributed to the Myself 
and Employer concept’s dimensional struc- 
tures is in keeping with the data. 

It appears that Super’s (1953) theory 
needs revision in light of these findings. The 
theory must be revised either to account for 
the assimilation of concepts that do not have 
the same connotative structure (i.e., how can 
implementation occur between the Myself and 
My Job concepts when Ss do not use the ag- 
gressiveness dimension to describe their jobs) 
or to specify which of the several dimensions 
of meaning composing the self-concept require 
implementation and to which specific concepts 
before vocational adjustment can be achieved. 

Assuming that only implementation of the 
evaluative dimension is necessary for voca- 
tional adjustment to occur, it follows from 
this research that vocational adjustment or 
job satisfaction is predictable prior to actual 
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job placement. Justification for this state- 
ment is based on the research design utilized 
in this study. The Ss were contacted initially 
prior to graduation. The Myself concept was 
rated the same way at this time as it was 
rated 3 months later when Ss were full-time 
employees in their chosen occupations. The 
way they rated themselves on the evaluative 
dimension of the Myself concept was then 
found to be significantly correlated with the 
way they rated the Employer and My Job 
concepts on the evaluative dimension. By 
nature of the scales composing the evaluative 
dimension vocational satisfaction can be im- 
plied. For example, Ss who rate themselves 
toward the good, active, complete, strong, 
interesting, sincere, free, refreshed, and 
steady side of the scales also rated their 
employers and their jobs the same way. The 
Ss who rated themselves toward the bad, 
passive, incomplete, weak, boring, insincere, 
constrained, weary, and fluttering side of the 
scales also rated their employers and their 
jobs the same way. Assuming that Ss who 
rate their job and employer as good, active, 
complete, etc., are vocationally satisfied, it 
follows that this satisfaction is predictable 
from the ratings on the Myself concept ad- 
ministered prior to graduation. There are a 
number of reasons why this may be true. 
It may be that Ss who have a low self-image 
will also have a low image of their employer 
and their job regardless of the pains taken 
by counselors to aid in job placement. An- 
other possible explanation is that Ss who 
have a low self-image may make more poor 
decisions and as a result of the poor decisions 
develop a low image of their employer and 
their job. The opposite would also hold for 
those who have a good self-image. It may 
be that they make more good decisions and 
are therefore more satisfied or that they 
would be satisfied regardless of any aid given 
in vocational counseling. Which, if either, of 
these possibilities is true can only be settled 
by future research. 

The overall weakness in Super’s (1953) 
theory is his central premise that congruency 
between the self- and job-related concepts 
equals satisfaction or adjustment. Congru- 
ency between these concepts, on the evalu- 
ative dimension, was shown in this study, 
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but the very nature of the scales makes it 
doubtful that an S who sees himself as 
toward the low end of the evaluative dimen- 
sions (i.e., bad, passive, incomplete, etc.), 
and who sees his employer and job the same 
way can possibly be satisfied. Further, if 
satisfaction does not follow from congruency 
at the bottom of the dimension, why should 
it be true at the top of the dimension? 
In other words, why should it be assumed 
that congruency between the self- and job- 
related concepts is indicative of satisfaction 
when they are judged high on the evaluative 
dimension but not when they are judged low? 

This author wishes to present the theory 
that a considerable amount of job satisfaction 
(or adjustment) is simply a reflection of an 
overall personality disposition toward self- 
satisfaction and life adjustment that Ss 
possess. The results of this and another re- 
search study (Weitz, 1952) are in keeping 
with such a theory. 

The following overall conclusions seem in 
order: 


(a) The self-concept is stable, both in the 
dimensions of meaning that compose the con- 
cept and in the way Ss rate themselves over 
a 3-month period of time. 
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(0) With knowledge of the way individu- 
als rate themselves on the evaluative dimen- 
sion of the self-concept prior to graduation 
from college it is possible to predict with 
a high degree of accuracy the way they will 
rate their employer and job concepts on the 
evaluative dimension after accepting full- 
time positions in their chosen occupations. 
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Opinion-attitude and market survey researchers often include in questionnaires 
a nonexistent item in a list of items on which attitudes and information levels 
are sought. These researchers assume that response to the phony item is evidence 
of invalid responses to other items. Verbal behavior of respondents claiming 
awareness of such a phony item is comparatively analyzed in evaluation of this 
practical technique. Data are interviews with 625 sample survey respondents. 
Respondents asserting awareness of the fictitious item are more likely (a) to 
profess awareness of genuine items and (b) to express favorable attitudes 
toward items. The technique permits a rough but workable estimation of 
response validity and does not greatly bias the sample’s representativeness if 


invalid responses are dropped. 


Opinion-attitude and market survey re- 
searchers are often faced with the problem 
of how much substantive information and 
understanding respondents have concerning 
the persons, objects, and issues about which 
they are asked to express their opinions and 
attitudes. It is well known that many re- 
spondents will freely express directional opin- 
ions and attitudes on topics about which they 
have little or no information or understanding. 
This tendency may reflect social pressure on 
respondents to have an opinion, it may reflect 
an effort to conceal their ignorance, or it may 
simply reflect confusion of the question with 
other topics about which they do in fact have 
some knowledge. This again raises the peren- 
nial question of the validity and reliability of 
certain survey results. 

Researchers sometimes try to estimate the 
level of information possessed by respondents 
on the topics under discussion by asking a 
series of “filter” questions which are, in 
effect, tests of background information about 
the discussion area (Erskine, 1962). This 
technique may be used to estimate or to im- 
prove the validity of the distribution of re- 
sponses on an issue. If a respondent shows 

1The author wishes to thank the Iowa Urban 
Community Research Center for making these sur- 
vey data available. William Erbe, Associate Director 
of the Center, designed and administered the survey. 


However, any errors in the analysis or interpretation 
of these data are solely those of the author, 


knowledge of the topic, then the questioning 
is pursued further; but if, on the contrary, 
the respondent shows little or no knowledge 
of the topic, then the line of questioning is 
abandoned or the validity of his response is 
considered dubious. 

For various purposes, researchers often at- 
tempt to elicit opinion and attitudes on a 
single interview schedule about a substantial 
number of objects. The researcher has the 
problem of determining how much, if any, in- 
formation the respondent has about each item. 
Perhaps the best technique would be to have 
the respondent correctly identify each item, or 
otherwise exhibit or express knowledge of the 
discussion areas, before the questioning is 
pursued further. This technique would, in ef- 
fect, substantially increase the number of 
questions asked. If the researcher seeks as- 
sessments of large numbers of items, these 
additional questions about each item are 
costly of valuable interview time as well as 
liable to interviewer coding error that may 
result in part from inarticulate identifications 
by respondents. 

An alternative practical technique, which is 
often used by some researchers to achieve the 
same end, is to include a bogus item, the name 
of a nonexistent object, among the list of 
genuine items as a simple check on the valid- 
ity and reliability of responses. The use of 
meaningless questions as a check on reliability 
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and validity is discussed by Payne (1951), 
McCord (1951), and Nunnally and Husek 
(1958). Respondents who volunteer an opin- 
ion on, or express “knowledge” of, the bogus 
item fall under suspicion of also having faked 
or confused information about some indeter- 
minable number of genuine items. Therefore, 
their responses to all items may be discounted, 
qualified, or dropped from the sample for pur- 
poses of analysis. Several reservations about 
the efficacy of the technique may prevent the 
confident use of this validity check in ques- 
tionnaire design. 

First, it is problematic how effective this 
device is for distinguishing respondents who 
are prone to offer opinions on, or to claim 
information about, genuine items when they, 
in fact, have little or no information about 
certain items. A researcher cannot be sure 
that, just because certain respondents reply 
to a fictitious item, they are also substan- 
tially more likely to reply to the genuine 
items even in a similar paucity of information. 
If a respondent confuses a fictitious question 
with something else that sounds similar to, 
or connotes, a real object in his awareness, it 
does not necessarily mean that his other re- 
sponses are invalid or unreliable. The mean- 
inglessness of a question is in the eye of the 
beholder, regardless of the realness or fic- 
titiousness of an issue. Second, it is not known 
to what extent the sorting out of these re- 
spondents by this device biases the manifest 
representativeness of the sample. 

There has been no study, as far as is 
known, that compares the general verbal be- 
havior and the demographic traits of respon- 
dents who confuse or fake information on 
survey questions with respondents who do 
not. It would be useful and comforting to 
survey researchers who use such devices to 
establish whether this technique effectively 
increases the validity of the distribution of 
responses and whether it does, in fact, desig- 
nate respondents who are more apt to fake or 
confuse information. The workability of this 
technique perhaps is more often assumed than 
known for a fact to be adequate. And it would 
be important to know whether faking and con- 
fusing information are traits that are closely 
linked to demographic traits of respondents, 
such as education, or are randomly dis- 
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tributed, for the most part, in the general 
population. 


RESEARCH DESIGN 


The source of data is three, stratified-area, two- 
stage, probability samples of households in which 
one adult respondent was randomly selected from 
each household. The surveys were conducted con- 
secutively during April and May 1962, in three small 
lowa urban communities, which differed widely in 
socioeconomic composition. The communities were 
chosen to be roughly representative of a variety of 
small midwestern communities. Interviews were con- 
ducted and completed with a total of 625 respondents. 
Completion rates were 84, 86, and 91% for the three 
samples drawn. The three samples were put together, 
and they are treated here as though they were a 
sample of a single universe. 

During the interviews, respondents were asked for 
their assessment of nine persons and seven organiza- 
tions that were contemporaneously prominent and, 
in most cases, controversial in national political af- 
fairs. The persons were: Senator Barry Goldwater, 
John F. Kennedy, Dr. Martin Luther King, Senator 
Joseph McCarthy, Richard N. Nixon, Walter Reuther, 
Franklin D. Roosevelt, Dr. Fred Schwarz, and Robert 
Welch. The voluntary associations were: American 
Civil Liberties Union, Americans for Democratic Ac- 
tion (or ADA), Christian Anti-Communist Crusade, 
Congress of Racial Equality (or CORE), John Birch 
Society, National Association for the Advancement 
of Colored People (or NAACP), and United World 
Federalists. The names were read to the respondent 
in this order with the indicated titles of persons and 
acronyms of associations, 

First, the interviewer handed the respondent a 
card on which was printed seven categories of al- 
ternative replies. The categories were: “Strongly ap- 
prove,” “Approve somewhat,” “Both approve and 
disapprove” (i.e., ambivalence), “Don’t feel much 
one way or the other” (ie., no opinion), “Disap- 
prove somewhat,” “Strongly disapprove,” and “Don’t 
know this person (or group).” The respondents then 
were asked the following question about the persons: 


Now I’m going to read a list of names of persons— 
some well known, some who are alive, some who 
are dead—whose names have been in the news 
lately or in their own time. Would you study the 
list of answers on this card and then tell me 
which of them comes closest to your feelings about 
this person? 


Second, the respondents were asked the following 
question about the voluntary associations: 


Now I’m going to read another list, only this time 
it will contain the names of organizations, rather 
than persons. Some of the organizations are well 
known, others are not so well known, but they 
have been in the news recently or in the past. 
From the same card which you used on the people, 
would you tell me which statement comes closest to 
your feelings about this group? 


FAKING IN SURVEYS 


A bogus item, “The League for Linear Programs,” 
was included in the list of organizations between 
the John Birch Society and the NAACP. Inter- 
- viewers were instructed not to identify the names of 
persons and organizations for respondents even if 
asked. If the respondents themselves attempted to 
identify the names and asked for a confirmation, the 
interviewers were instructed to oblige with a simple 
“ves” or “no.” In the case of the bogus organization, 
the interviewer’s reply was, of course, always “no.” 

The respondents’ replies were recorded in the 
six Likert-type scale categories designed to measure 
the direction and intensity of opinion, and a seventh 
category in which respondents could express no 
knowledge of the person or organization, Thus, the 
respondents had a choice, on one hand, of expressing 
a favorable or unfavorable opinion, mildly or strongly 
held, ambivalence, or no opinion, or, on the other 
hand, of saying that they had no knowledge of the 
person or organization. An expression of any direc- 
tion and intensity of opinion, ambivalence, or no 
opinion is accepted as an assertion of awareness of 
a person or organization. Mere awareness of a name, 
of course, does not necessarily denote competent 
understanding of the person’s or organization’s posi- 
tion on public issues. Of the respondents, 66, or 
over 10%, in effect, expressed “knowledge” of the 
bogus organization by declining to indicate no 
knowledge of the item. 

Three items, John F. Kennedy, Richard N. Nixon, 
and Franklin D. Roosevelt, were almost universally 
recognized by both regular and suspect respondents 
and, therefore, were not useful to discriminate be- 
tween levels of information. Of the respondents, 98% 
or more professed awareness of all three names. 

The principal and guiding hypotheses in this study 
are that the 66 respondents who confused or feigned 
information about the bogus item, compared to the 
other 559 respondents (a) are more likely to claim 
knowledge of the other items as well; (b) are more 
likely to express favorable opinions because, in their 
Jack of surety or in awareness of their ignorance, 
they will accede, so to speak, and respond favorably ; 
and, finally, (c) will differ according to relevant 
demographic traits, such as having less education, 
which is reflected in, perhaps, an attempt to avoid 
an affront to their self-esteem. 


RESULTS 


First, suspect and regular respondents were 
compared according to the proportion aware 
of each name. (For brevity and rhetorical con- 
venience, the two categories of respondents 
will be referred to as “suspect” and “regular” 
respondents.) Table 1 shows that suspect 
respondents more often than regulars indi- 
cated awareness of each name. The differences, 
with two exceptions, are statistically signifi- 
cant by the chi-square test (Siegel, 1956). 
Regular respondents, on the average, indi- 
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cated awareness more often of the six persons 
than of the six organizations among the lists 
of names, but among suspect respondents, on 
the average, these differences between the two 
classes of items did not result. The greater 
response of the suspect respondents to names 
of organizations is possibly due, in some cases, 
to their confusing some of the organizations 
with other organizations that have similar 
sounding names or whose names connote or- 
ganizations familiar to the respondents. This 
observation lends support to the assumption 
that suspect respondents are more apt to 
profess awareness of items, either because 
they are more apt to confuse information or 
because they have faked information, or both. 

Suspect respondents, even though they are 
more likely to profess awareness of all items, 
are less apt to have a high school diploma or 
more education. Nearly all pertinent audience 
research in mass communications indicates 
that education is perhaps the most important 
variable antecedent to the possession of public- 
affairs information. Education, therefore, may 
be considered an indirect index of valid in- 
formation levels. The base Vs in Table 2 show 
that half of the suspect respondents have less 
than a high school education contrasted to 
over a third of the regulars; however this dif- 
ference is not statistically significant at the 


TABLE 1 


PROFESSED AWARENESS OF NAMES, BY VALIDITY 
STATUS OF RESPONDENTS 








% Regulars Suspects 
Item x? 
Percentage Percentage N 
aware aware 
Person 
Dr. F. Schwarz 9 52 38 25 | 41.90* 
R. Welch 24 134 36 24 4,13* 
Dr. M. L. King 52 288 62 40 1.88 
W. Reuther 71 393 82 54] 3.09% 
Sen. B. Goldwater 76 425 82 54 .80 
Sen. J. McCarthy 78 432 91 59 |} 5.34* 
Organization 

U. W. Federalists 19 104 62 41 | 60.15* 
AGU, 22 124 54 36 | 30.67* 
A.D.A, 39 218 76 50 | 31.08* 
C.O0.R.E: 50 280 79 §2. 1-18.15* 
J. Birch Society 57 318 1S 49 | 7.48* 
N.A.A.C.P. 86 482 97 64 512% 
C. A-C. Crusade 46 254 80 52 | 26.33* 
Base N* 559 66 


« A few refusals or ‘“‘no answers,”’ if any, on certain names, 
which never exceed two cases, are excluded from the base Ns 
used to calculate percentages and chi-square values. 

* > <.05, one-tailed test. 
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05 level. If suspect respondents were more 
apt to have higher education, their greater 
professed awareness of all items would be at 
least partially explained, but the education 
difference is in the opposite direction. And 
Table 2 shows that the differences in aware- 
ness of items between suspect and regular 
respondents persist when education is held 
constant, although this difference tends to be 
diminished at the high education level. This 
latter observation likely results because the 
higher educated among the suspect respon- 
dents probably do, in fact, have more in- 
formation about the genuine items. It sug- 
gests, furthermore, that low education re- 
spondents are more apt to confuse and 
exaggerate their knowledge. It is possible, of 
course, that certain among the suspect re- 
spondents, who greatly exaggerated their in- 
formation, account for most of the difference 
between the two groups, while other suspect 
respondents replied validly to all items except 
the fictitious item. There are no adequate 
criteria, however, for picking these respon- 
dents out of the sample. 
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A comparative analysis was made next of 
regular and suspect respondents, based on the 
direction of their replies in the Likert scale 
categories. It was proposed that the suspect 
respondents who indicated awareness of items 
and expressed a directional opinion will often 
seek harmony and accord by expressing favor- 
able opinions of items. And, by way of aug- 
menting this effect, those suspect respondents 
simply reacting to the positively connoting 
names of organizations will tend to express 
agreeable or favorable opinions. The names of 
several of the organizations have positive con- 
notations in terms of conventional American 
values, especially, perhaps, to respondents who 
may not have actual knowledge of the or- 
ganization. Other and often similar explana- 
tions have been offered to explain the tendency 
of certain respondents to express a positive 
response set. Lane (1962) and Lane and 
Sears (1964) suggest that American culture 
carries a concept of good citizenship that en- 
courages “supportive” rather than “critical” 
attitudes toward certain political objects. 
Couch and Keniston (1960), for example, re- 


TABLE 2 


PROFESSED AWARENESS OF NAMES, By VALIDITY STATUS OF RESPONDENTS AND EDUCATION 








High school graduation or more education 


Some high school or less education 

















ro Regulars Suspects Regulars Suspects 
Percentage Percentage Percentage ; Percentage 
aware aware Ni aware WN. aware i 
Person 
Dr. F. Schwarz 10 33 39 13 9 19 38 12 
R. Welch 31 105 33 11 14 29 41 13 
Dr. M. L. King 65 224 73 24 30 63 48 15 
W. Reuther 82 281 91 30 52 111 75 24 
Sen. B. Goldwater 90 308 91 30 54 116 75 24 
Sen. J. McCarthy 88 301 94 31 61 129 87 Qi 
Organization 
U. W. Federalists 21 73 64 21 15 31 62 20 
OWE 27 92 48 16 15 32 62 20 
A.D.A. 44 152 Te 24 30 65 81 26 
CORE: Sif 196 76 25 39 83 84 27 
J. Birch Society 71 244. 79 26 35 74 71 22 
N.A.A.C.P. 92 317 94 31 77 164 100 32 
C. A-C. Crusade 48 165 76 25 41 88 87 27 
Base WV 344 33 213 oe 





® Refusals or ‘‘no answers’’ excluded from base Ns. 
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TABLE 3 


FAVORABLENESS TOWARD NAMES, BY VALIDITY STATUS OF RESPONDENTS 











Regulars Suspects 
Item Percentage Directional | Percentage Directional xe 
favorable N opinion favorable N opinion 
opinion® (Base NV) opinion® (Base NV) 
_ Person 
Dr. F. Schwarz 72 23 32 87 13 15 oS 
R. Welch 23 22 94 62 8 13 6.43* 
Dr. M. L. King 58 99 172 66 19 29 36 
W. Reuther 32 84 259 54 21 39 5.90* 
Sen. B. Goldwater 72 175 244 58 19 33 ~— 
Sen. J. McCarthy 42 122 291 63 24 38 5.31% 
Organization 

U. W. Federalists 56 30 54 76 19 25 2.22 
AGG... 58 42 73 73 16 22 1.06 
A.D.A. 60 77 128 77 24 31 DEX) 
C.0:R.E. 78 150 193 78 25 32 .03 
J. Birch Society 6 16 254 21 7 34 6.50* 
INFACASC:P. 78 253 326 94 49 D2 6.71* 
C. A-C. Crusade 72 153 211 67 26 39 — 
“League” = = = 62 21 34 —_ 





8 Percentage with favorable opinions among respondents with any directional opinions. 
b Fach chi-square value calculated on basis of Ns having directional opinions on each name. 


*p < .05, one-tailed test. 


gard the agreeing response set as a reflection 
of a basic personality trait, and they review 
still other hypotheses that attempt to explain 
this phenomenon. 

Table 3 shows that suspect respondents, 
among those who expressed any directional 
opinion, are more likely than regulars to ex- 
press favorable opinions toward 10 of the 13 
items. There are two reversals and one tie. 
The consistency of the directional differences 
toward favorable responses, even though half 
of the differences are not statistically sig- 
nificant, leads us qualifiedly to accept the 
second proposition that suspect respondents 
are more apt to express favorable opinions of 
items. Although the breakdowns are not 
shown here, suspects tend to express strong 
approval rather than just mild approval when 
they express favorable opinions toward items. 

It is interesting to note, parenthetically, 
that these data do not show a consistent dif- 
ference between suspect and regular respon- 
dents on the proportions who expressed no 
opinion or ambivalence toward the various 


names. Apparently there is little or no ten- 
dency for suspect respondents, in the face of 
small surety or self-awareness of ignorance, 
to retreat into “no opinion” and qualified 
responses. 

If it is concluded that this technique effec- 
tively identifies respondents who are likely 
to confuse and fake information, then the 
question remains whether their removal from 
the sample for purposes of analysis or their 
segregation for special treatment adversely 
affects the manifest representativeness of the 
sample. The 66 suspect respondents were re- 
moved from the original sample (V = 625), 
and it had little effect on the frequency dis- 
tributions of major demographic variables 
that characterize the sample. It is possible, 
however, that a small high-status bias was in- 
troduced because the suspects, it was noted 
above, are perhaps somewhat more likely to 
have low education. The 559 remaining re- 
spondents were compared to the original form 
of the sample on sex, age, sex by age, educa- 
tion, family income, nativity, marital status, 
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and community of residence. The frequency 
distributions are closely similar. There is no 
difference greater than 1% between the dis- 
tributions on each of these demographic traits. 
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The matter of truth in the packaging and pricing of products in the American 
marketplace has been a subject of public controversy in recent years. By treat- 
ing “truth” or “confusion” as points on an underlying psychological dimension, 
an attempt was made to define the issues in this controversy more objectively. 
3 behaviorally based quantitative measures of confusion in unit-price informa- 
tion for packaged products were developed, and applied in a supermarket 
setting. 33 young married women were instructed to select the most economical 
package for each of 20 products on display at a local supermarket. Each of the 
3 measures employed proved to be highly reliable, based on a retest of 13 of 
the Ss, and reasonably valid, when correlated with an independent measure of 
consumer confusion. Significant differences were found for the set of products on 
all 3 measures of confusion, and there is reason to believe that these differences 


reflect, at least in part, differences in packaging practices. 


The matter of truth in the packaging and 
pricing of products in the American market- 
place has been a subject of public controversy 
in recent years. Much of the current attention 
stems from the introduction in the 87th Con- 
gress of the so-called Truth-in-packaging Bill 
by Senator Philip A. Hart of Michigan. A 
basic issue which has emerged from the Senate 
hearings associated with this proposed legisla- 
tion concerns the alleged existence of a state 
of consumer confusion in the American market- 
place—confusion regarding the true contents 
and prices of many common retail products. 

The current study attempts to objectively 
define the issues in the truth-in-packaging 
controversy by treating consumer confusion 
as a psychological variable capable of measure- 
ment. 

MeEtTHOD 


Subjects 


Thirty-three young married women who were stu- 
dents or the wives of students at Eastern Michigan 
University served as subjects (Ss). All Ss had at- 
tended college for at least 1 year and in addition 


1 This research was supported in part by a grant 
from the College of Arts and Science of Eastern 
Michigan University, Volunteers from The Ypsilanti, 
Michigan Chapter of the American Association of 
University Women, together with two students, 
Alice Gretzler and Margaret Keck, assisted in the 
data-collection phase of the research. 

A briefer version of this paper was read at the 
American Psychological Association, Chicago, Sep- 
tember 1965. 


had been married for 1 or more years. The Ss were 
tested in a local supermarket with which they were 
familiar through previous shopping; indeed most Ss 
were regular customers of the store. Recruitment of 
Ss took the form of personal requests of Ss through 
visits to their homes (mostly apartments in the 
married student complex of Eastern Michigan Uni- 
versity). The Ss were paid for their time. 


Procedure 


The Ss were instructed to select the most eco- 
nomical (largest quantity for the price) package for 
each of 20 products on sale at the selected super- 
market. A time limit was enforced for each product 
decision, a limit based on the variety of packages 
on display for the product. More specifically, 10 
seconds were allowed for each of the package types 
in the product category, unless either (a) there were 
less than six package types to a product class, in 
which case a 1-minute time limit was used, or (b) 
there were more than 24 package types to a product 
class, in which case a 4-minute time limit was em- 
ployed. 

In addition to stating which package she be- 
lieved to be the most economical for each of the 20 
products, each S reported to the experimenter (£) 
accompanying her the information which she used in 
making her decision. 

Each of the 20 products employed in the study 
had the following characteristics: 


1. Two or more different-sized packages of the 
product were on sale at the supermarket. 

2. Two or more different brands of the product 
were on sale at the supermarket. 

3. The two or more brands for each product ap- 
peared to be comparable with regard to the nature 
of their contents. Thus dry cereals were not selected 
as a product since corn flakes and raisin bran do 
not appear to be comparable; on the other hand 
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the different brands and varieties of family flour 
were considered comparable. 

4. The products appeared to be widely used by 
American families. 

5. The products were significant contributors to 
total supermarket sales. Thus table salt, which 
qualifies on the basis of the first four criteria, was 
not used in the study since it represents only about 
1% of total supermarket sales. 


Finally, a characteristic not of any one product, 
but of the whole group of 20, was that the set of 
products appeared to be a balanced representation of 
the packaged products available at American super- 
markets. 

The testing of the 33 Ss took place over a 2-day 
period. To aid in the testing a map of the super- 
market was constructed with a specified route which 
touched upon the location of each of the 20 products. 
The Ss were then tested in groups of 5-10, Each 
member of a group was randomly paired with an E 
and the two were randomly assigned to one of the 
20 product locations as their starting point in the 
test sequence. After S had responded to E’s ques- 
tions at the first location, the two proceeded along 
the route to the next product location, and continued 
in this manner until S had been tested at all 20 
product locations. This experimental design not only 
permitted the testing of many Ss simultaneously but 
also allowed the effects of a variety of product 
sequences (20 in all) to be reflected in the results 
of the study. 

Of the 33 Ss, 13 were retested 2 days after their 
original testing, thus permitting the determination of 
test-retest reliability coefficients for the experimental 
measures employed in the study. Concurrent validity 
coefficients for the measures were ascertained from 
correlations with Ss’ pretest ratings of the 20 
products. The pretest consisted of a_ brief story 
about a housewife who is undecided about which 
of two packages of a particular product to purchase. 
The two packages are equally appealing to her on a 
number of grounds, such as appearance and quality 
of contents. She finally decides to purchase the 
package which gives her more of the product for the 
price. It was pointed out to Ss that the housewife’s 
task of determining which of the two packages is 
more economical would vary in difficulty for the 20 
products. The Ss were instructed to rank the 20 
products, using an alternation ranking procedure, 
with respect to the estimated difficulty the house- 
wife would have in determining the more economical 
of two packages containing the product. 

At the time the pretest was administered Ss were 
asked to indicate which of the 20 products were not 
usually found in their household. 


Measures 


Three behaviorally based quantitative measures of 
confusion in unit-price information are used in the 
analysis of the data. The first, Confusion Measure 1, 
simply indicates the number of Ss who made in- 
correct choices for each of the 20 products. Con- 
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fusion Measure 2 calculates for each product the 
mean percentage increase in unit price for Ss’ se- 
lected packages compared with the most economical 
package. Confusion Measure 3, which employs data 
from a supermarket trade-magazine study dealing 
with the total sales for each of the 20 products, 
provides an estimate of the increase in price which 
an economy-minded household unit with a specified 
budget would pay over a constant time-period, say 
a year, if its purchases reflected the values found for 
Confusion Measure 2, Thus Confusion Measure 3 is 
a weighted version of Confusion Measure 2. 

The rationale behind Confusion Measure 1 is rea- 
sonably clear. It is desirable to know whether Ss are 
able to select the most economical package for each 
of the 20 common products; indeed, the number of 
Ss who fail in this task should be an indication of 
the degree of confusion associated with consumer at- 
tempts to purchase supermarket products on the 
basis of economy. The second measure of confusion 
simply reflects the magnitude of the selection errors 
expressed for each product as a percentage of the 
unit price of the most economical package. It is as- 
sumed that the larger the value found for a par- 
ticular product, the greater the error which an 
economy-minded consumer would be expected to 
make when purchasing the product. 

The last measure to-be considered, Confusion 
Measure 3, represents an interaction of the estimated 
consumer expenditures for a supermarket product and 
the percentage error given by Confusion Measure 2. 
This third measure of confusion provides for each 
product a dollar-and-cents estimate of the additional 
expenses which an economy-minded shopper would 
bear due to errors in package selection. To give sub- 
stance to this measure it is necessary that the actual 
records of consumer expenditures or estimates of 
such expenditures be available for processing. A 
search by the writer for a detailed product-by- 
product breakdown of the supermarket expenditures 
for some statistically average, or otherwise well- 
specified household unit, proved to be unsuccessful. 
Indirectly relevant data were found, however, in the 
Progressive Grocer Colonial Study (1963), which 
reports the percentage contribution to total sales 
made by each of several hundred products for six 
supermarkets in the southeastern United States, The 
six markets were members of a larger chain called 
the Colonial Stores. It seems clear that the results 
of the Colonial Study do not reflect American super- 
markets or consumers as a whole. The study was 
conducted over an 8-week winter period within a 
single chain of supermarkets in one region of the 
country. Thus there are seasonal, regional, and 
probably socioeconomic reasons for suspecting dif- 
ferences. However, in the absence of any suitable 
national data dealing with either supermarket sales 
or consumer expenditures on an individual product 
basis, the results of the Colonial Study are employed 
in the analysis of Ss’ responses of the present study. 
The reader is cautioned that the Colonial Study re- 
sults serve only as an estimate, with strongly sus- 
pected biases, of the corresponding national data. 


CONSUMER CONFUSION AND SUPERMARKET PRODUCTS 


The percentage contributions to the total super- 
market sales for the 20 products of the present study 
are perhaps made more meaningful when applied to 
a consumer’s annual budget of say, $1,000 for super- 
market expenditures. Thus the Colonial Study figure 
of 1.1% for powdered detergents assumes a value of 
$11.00 for this hypothetical budget. For this budget, 
Confusion Measure 3 is simply the portion of the 
individual product expenditure which can be as- 
signed to error in package selection. The actual 
amount assigned would depend upon the value of 
Confusion Measure 2. Thus a household unit with 
economical shopping habits and a $1,000 annual 
supermarket budget might spend $11.00 for powdered 
detergents. Given a value of 24% for powdered de- 
tergents on Confusion Measure 2, we would find 
24404 of the $11.00, or $2.13, as the amount over and 
above the minimal amount of $8.87 which our 
economy-minded household unit would pay if it 
always succeeded in purchasing the most economical 
package of powdered detergent. For this case then 
the value for powdered detergent on Confusion 
Measure 3 would be $2.13. 

Values for other products were found in a manner 
similar to that described above. One first constructs 
a ratio consisting of the value of Confusion Measure 
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2 over an expression made up of the Confusion 
Measure 2 value plus 100. The next step is to mul- 
tiply this ratio by the estimated consumer expendi- 
ture for the product. To find the value for a total 
supermarket expenditure different from the base of 
$1,000 employed here simply construct a ratio of the 
new total expenditure over the base of $1,000 and 
multiply the Confusion Measure 3 value by this 
fraction. 


RESULTS 


Of the total of 660 decisions made by the 
33 Ss, 47 represented products which Ss stated 
were not usually found in their homes. Since 
the proportion of errors for these 47 decisions 
did not differ significantly from the cor- 
responding proportion for decisions involving 
more familiar products, the two classes of 
selections (familiar and unfamiliar) were 
pooled for the purposes of analysis. 

The three measures of confusion in unit- 
price information were applied to the 20 pro- 
ducts employed in the study and were found 


TABLE 1 


ConrFUSION VALUES AND ESTIMATED CONSUMER EXPENDITURES FOR 20 SUPERMARKET PRODUCTS 

















Cankisiean Contimon ae Estimated> 
Product Measure 1 Measure 2 Pen Bae 
(total errors) (percentage error) eaeaniinre) inal 
Canned peaches 8 2 06 3.10 
Canned peas 5 5 .20 4.10 
Catsup 23 13 28 2.40 
Evaporated milk 2 0 0.0 6.60 
Family flour 6 2 od 6.70 
Frozen orange juice 6 6 36 6.40 
Granulated sugar 0 0 0.0 10.70 
Instant coffee 11 10 92 10.10 
Liquid bleach 32 32 70 2.90 
Liquid detergent 8 4 24 6.20 
Liquid shampoo 14 63 1.01 2.70 
Mayonnaise 8 16 46 3.30 
Paper towels 30 12 48 4.50 
Peanut butter i 2 06 3.20 
Potato chips 22 1 05 5.30 
Powdered detergent 33 24 213 11.00 
Soft drinks (cola) 27 17 2.01 13.80 
Solid shortening 0 0 0.0 5.50 
Toilet tissue 22 5 RO 7.70 
Toothpaste 22 16 .69 5.00 
Sum 10.15 121.20 
Mean 14.3 Ty 507 6.06 
aN =33 


b Based ona total annual supermarket expenditure of $1,000. 
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to have substantial validity when correlated 
with the experimental pretest (Spearman rank 
correlations of .59, .62, and .70 for Confusion 
Measures 1, 2, and 3, respectively). Likewise, 
substantial test-retest reliability coefficients 
were found (Spearman rank correlations of 
91, .93, and .91 for the three numbered 
measures). As a check on the internal con- 
sistency of the results the 33 Ss were first 
divided into groups of 16 and 17, and separate 
mean confusion scores were computed for each 
group on each of the 20 products. Next Spear- 
man rank correlations were computed between 
the mean confusion scores of the two groups, 
yielding values of .93, .93, and .96 for the 
three numbered measures. 

The complete list of 20 products and their 
associated values on the three confusion meas- 
ures are presented in Table 1. Also presented 
in Table 1 are the estimated consumer ex- 
penditures for the 20 products for a hypo- 
thetical consumer budget of $1,000. 

Two nonparametric techniques, the Coch- 
ran Q Test and the Friedman y,? Test, were 
employed in the analysis of the confusion data 
(Siegel, 1956). Significant differences were 
found for the set of 20 products on Confusion 
Measure 1 (Cochran Q = 283, p< .001). 
The mean value of 14.3 yields an error rate 
of 43% for the 33 Ss. Significant differences 
were also found for the set of ranked product 
values on Confusion Measure 2 (Friedman 
Xr’ = 214, p < .001) and Confusion Measure 
3 (Friedman x,? = 242, p< .001). It is of 
interest to note that of the total estimated an- 
nual consumer expenditure of $121.20 for the 
20 products the sum of $10.15 can be ac- 
counted for by errors in consumer selections. 
Thus, if it were possible for an economy- 
minded consumer with a $1,000 supermarket 
budget to always select the package giving 
her the largest quantity of a supermarket 
product for the money, she would pay an 
estimated $121.20 minus $10.15 or $111.05. 
It is estimated then that a more typical 
economy-minded shopper, who makes her se- 
lections in conformance with Ss’ selections in 
the current study, would spend $10.15 more 
than the errorless figure of $111.05, or in other 
words, she would spend 9.14% more than the 
hypothetical consumer who was always able 
to select the most economical package. 
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DISCUSSION 


It is important to note that the plan and 
procedures of this study deal not at all with 
the day-to-day purchases of American con- 
sumers. It may be that economy plays a 
small role in many of these purchases. How- 
ever, for the purposes of this study the ques- 
tion of what actual criteria are employed by 
consumers at large in their supermarket 
shopping is largely an irrelevant one. The 
question of concern is the following: 

Is it possible for consumers to select, within 
a reasonable period of time and without the 
aid of paper and pencil or of computing de- 
vices, that package of a particular super- 
market product which offers the largest quan- 
tity of the product for the money? 

If large numbers of consumers cannot make 
correct selections when so instructed, and 
particularly if their errors are large, it would 
seem that the task is a confusing one. If in 
addition their errors are costly, there is real 
reason for concern. “ 

Of course it does not necessarily follow that 
a confusing task results from improper pack- 
aging practices. However, for the present study 
significant differences were found for the set 
of 20 products on all three measures of con- 
fusion, and it seems unlikely that these dif- 
ferences can be attributed in any large part 
to factors other than the differences in package 
characteristics. For example, the possible in- 
fluences of warm-up or fatigue effects were 
offset by the experimental variations in order 
of product presentation. Also level of illumina- 
tion appeared to be fairly constant for the 20 
product locations. In addition, in several in- 
stances packaging practices were identified 
which required calculations which could not be 
performed without great difficulty, if at all, in 
one’s head. For example, the quantity of paper 
towels was presented in terms of number of 
sheets but the size of a sheet was not standard 
across brands. Indeed in the current study 
they ranged from a small of 7.5 inches x 11 
inches to a large of 11 inches xX 11 inches. 
Furthermore, the number of sheets in a roll 
varied from 75 to 200. And finally, some rolls 
were packaged singly and others two to a 
package. 

Liquid bleach was a second product charac- 
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terized by difficulties which were apparently 
influenced by packaging practices. In par- 


ticular one brand of this product which for- 


merly was made up with the commonly em- 
ployed 5.25% concentration of sodium hypo- 
chlorite, the active ingredient in bleach, had 


been reduced in concentration to 3.25% a 


short time before the data were collected; the 
selling price of the product however had not 
been reduced nor for that matter had there 
been a change in the label or package in any 
manner other than the listing of the new con- 
centration on the back of the bottle. Since 
most of the Ss were regular shoppers at the 
supermarket employed in the study it appears 
that rather than examining the bottle closely, 
they assumed the concentration had not been 
changed. With the original 5.25% concentra- 
tion of sodium hypochlorite this bottle would 
have been the most economical selection; how- 
ever, with the change in concentration this was 
no longer the case. 

Since the results of this study were in- 
fluenced in no small way by the time allotted 
for each package selection it might be well to 
explain the basis for selecting the unit of 10 
seconds. Since S’s task was considerably more 
demanding than day-to-day supermarket shop- 
ping it was felt that time should be provided 
above and beyond the normal shopping time. 
A recent study (Fitzimmons & Manning, 
1962) found that it takes a shopper approxi- 
mately 1 minute to select an item in a super- 
market. This 1-minute period includes walk- 
ing time in the store but not time at the 
check-out counter, The Ss of the current study 
were given an average of 2.35 minutes at each 
product location; with the addition of walking 
time they had available about three times as 
much time as they would be expected to take 
for their supermarket shopping. 

It is of interest to inquire what generaliza- 
tions might be made from the present findings 
to other Ss and other settings. First with re- 
gard to Ss of the study it seems clear that they 
represent a combination of qualities that should 
make for extremely low confusion scores, As 
a group they are characterized by considerable 
education and by the financial strains usually 
associated with young married college couples. 
These two ingredients suggest not only strong 
interest in economy as a criterion for super- 
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market shopping, but also considerable suc- 
cess in meeting this criterion. It would seem 
then that other individuals in general, and 
less-educated individuals in particular, would 
not perform as well as Ss of the present study. 

Although superficially it might appear that 
the results of this study would transfer readily 
to other supermarket settings there are differ- 
ences between markets which should be con- 
sidered. Although many markets carry the 
major brands for common products they differ 
in the extensiveness of sizes available and in 
the particular store brands which they carry. 
Also differences in shelf space and position 
for a product might well influence the ease 
with which a shopper could make comparisons. 
Problems arise too when one attempts to gen- 
eralize the results for the 20 products em- 
ployed here to other supermarket products. 
That sweeping generalizations are clearly in- 
appropriate is indicated by a Colonial Study 
finding that about 30% of the consumer’s 
supermarket expenditures are allocated to 
meat and produce, two foods which are not 
packaged beyond a simple brown bag or cel- 
lophane wrapper. With regard to more typi- 
cally packaged supermarket products, again 
problems arise. The selection criteria for the 
20 products employed in the present study 
have provided us with a clearly nonrandom 
sample of packaged products. A particularly 
strong bias was introduced by selecting prod- 
ucts with a record of relatively high total 
sales, which would imply relatively high 
familiarity among shoppers. 

Two natural next steps of research are sug- 
gested by the data. The first would be a sys- 
tematic study of what factors make for con- 
fusion in consumer selection. For example, 
packaging practices which are frequently 
claimed by consumer spokesmen to be con- 
fusing include the use of the following: 

1. Poorly presented information concerning 
both the nature of the contents and the quan- 
tity of contents. These deficiencies in display 
may take many forms; for example, too small 
print, weak contrast between the printed in- 
formation and the background, and failure to 
present the information at a prominent loca- 
tion on the package. 

2. Misleading information concerning both 
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the nature of the contents and the quantity 
of contents. “Giant quart” is a typical ex- 
ample of the latter practice. 

3. Misleading information concerning price. 
Often cited here is the so-called cents-off 
specials. Representatives of consumer groups 
claim that these “specials” are often not 
acknowledged by supermarket managers in 
their pricing policies. 

4. Unnatural numbers to indicate quantity. 
An example here would be the use of fractional 
or mixed numbers instead of whole numbers. 

With a larger number of products than the 
20 of the current study, regression analyses 
could be performed to determine which, if any, 
packaging characteristics serve as good pre- 
dictors of the confusion measures. 

A second potentially valuable research step 
would involve the development and evalua- 
tion of training guidelines for consumers. 
With a larger and more heterogeneous group 
of Ss than the one employed here one could 
attempt to identify the distinguishing package 
information employed by high-scoring (as 
compared to low-scoring) Ss in their perform- 
ance of the experimental task. A natural ap- 


Monrore PETER FRIEDMAN 


plication of the results of such a study would 
be to consumers at lower socioeconomic levels. 


CONCLUSIONS 


Within the confines of the particular ex- 
perimental setting employed, the following 
conclusions appear to be in order. 

1. The three measures of confusion in unit- 
price information have substantial validity 
and reliability. 

2. The 20 products differ significantly on 
all three measures of confusion. 

3. There is reason to believe that these 
differences reflect, at least in part, differences 
in packaging practices. 
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MEASURING FATIGUE * 


J. E. HUETING anv H. R. SARPHATI 


Physiological Laboratory, University of Amsterdam 


8 Ss between 19 and 23 yr. old performed an exercise during 11 min. on a 
bicycle ergometer on 13 days in succession. Not being aware of the systematic 
daily variations in the slope of the work load, all Ss showed significant cor- 
relations between subjective feelings of general physical fatigue—as expressed on 
different kinds of rating scales—and slope of work load. Regression equations 
satisfactorily describe linear relationships between load and fatigue. Factor 


analysis suggests a factor “increasing fatigue, 


Many authors have designed experiments 
in an attempt to measure physical fatigue, a 
concept defined as the outcome of physiologi- 
cal and psychological processes, symbolically 
expressed by the subjects (Ss) during and 
after physical work. Feelings of general physi- 
cal fatigue, however, have appeared unman- 
ageable to many authors. 

Muscio (1921) stated that it is impossible 
to define the concept as psychologically con- 
ceived and recommended that it be eliminated 
entirely from the scientific discussion. Ryan 
(1953) considered it of great importance to 
arrive at the measurement of “effort,” for 
example, to be able to compare one work situ- 
ation with another. But he doubted the pos- 
sibility of arriving at a finer discrimination 
than the one of lying down and running at 8 
miles per hour, which is quite uninteresting. 

These statements can be carried back to 
defects in the design of the experiments in 
question, defects which are still difficult to 
overcome. 

1. The inherent shortcomings of the Mosso- 
type ergograph, the output of which cannot be 
kept constant, and as a result of which ex- 
perimental outputs of different Ss or of the 
same S on different occasions cannot be com- 
pared. Fatigue of a local character only is 
induced, while such irrelevant factors as 
motivation, training, practice, warming-up, 
skin irritations, etc., may interfere. 

2. In order to relate fatigue to physiological 
variables, fatigue tests have been developed 
to ascertain different levels in oxygen debt, 
heart frequency, flicker fusion frequency, etc. 


1 The processing and computation of the data was 
partly supported by a grant from the ‘Amsterdamse 
Universiteits-Vereniging.’ 
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” and a factor “decreasing fitness.” 


None of these tests, however, appeared to give 
satisfactory results. 

3. Identification of fatigue with physiologi- 
cal parameters as respiratory or metabolic 
acidosis, diminishing neuronal excitability, 
etc., leads to a diversity of results, giving rise 
to quite diverging interpretations. 

Woodworth and Schlosberg (1954), Scher- 
rer and Monod (1960), and Schmidtke (1965), 
surveying the field nearly half a century after 
Muscio’s negative recommendation, have con- 
cluded, respectively, that the research in this 
respect has been “quite unsatisfactory,” “frag- 
mentaire et descriptive,’ and “wenig (mit) 
anzufangen.” 

In recent times, some experimenters (Borg, 
1962; Borg & Dahlstrém, 1962; Dirken, 
1966; Schmidtke, 1965) took new ways. Tak- 
ing the above factors into account, they used 
modern ergometers, and gave rating scale 
scores an adequate statistical treatment. How- 
ever, these authors did not realize the prin- 
cipal distinction to be made between percep- 
tion and estimate of variations in load level, 
and intensities in fatigue as a consequence of 
these differences. Thus, in so far as these au- 
thors intended to measure fatigue, Titchener’s 
stimulus error was evidently committed by 
presenting the stimuli, that is, the load levels, 
in such a way, that Ss were enabled to get in- 
formation from sources other than the changes 
in the organism itself. In other words, they 
based their judgments on changes in the ex- 
ternal environment, instead of on changes in 
their internal environment as the source of in- 
formation on the intensity of the fatigue. 

Therefore, the experimental set-up’ was 
such that Ss were unaware of these changes 
in the external milieu, that is, the variation 
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in height of work load or whatever other ex- 
ternal cue. Hueting and Visser (1960) and 
Hueting (1964), to avoid the stimulus error, 
obtained direct estimates of the intensity of 
the fatigue, that were supported afterwards 
by Ss, who had no idea about the changes in 
work-load levels. The authors found significant 
correlations between fatigue, physiological 
variables, and load level. 

The present experiment forms an extension 
and a validation of that approach. 


METHOD 


After habituation to the experimental situa- 
tion, the apparatus, and the use of the rating 
scales, a general physical fatigue was induced 
in eight normal, nontrained Ss from 19 to 23 
years old, by performing an exercise on a bi- 
cycle ergometer during 11 minutes on 13 days 
in succession. The starting work load of 3 
watts was increased every minute by 10% 
of the final minute’s load, which was varied 
systematically from day to day between 70- 
130 watts. In previous experiments this pro- 
gram appeared to be constant in the eyes of 
Ss. In this way one could expect direct mag- 
nitude estimates of the feeling of fatigue. The 
Ss accepted the experiments as part of an in- 
vestigation in training effects, since a number 
of physiological variables were recorded. 

The Ss indicated their feeling of fatigue dur- 
ing the work period and the 5-minute re- 
covery period by pressing a key attached to 
the handle bar of the ergometer, thus moving 
the pointer of a voltmeter with blank scale to 
any position between the horizontal and the 
vertical: respectively, “fatigue pointer e(xer- 
cise) ,” “r(ecovery),”’ with indexes for minutes. 

Still sitting on the ergometer, Ss gave an 
overall impression of their fatigue by moving 
the pointer at one time toward the angle of 
the matched magnitude of fatigue: “fatigue 
pointer t(otal).” 

After that, Ss matched their fatigue with 
the volume of a white noise by raising their 
hands while the experimenter increased the 
volume of a white noise generator from 0 
decibels on: “fatigue noise.” (The Ss had 
been tested on normal sense of hearing.) 

Finally, Ss put a cross on a 7-point rating 
scale: “fatigue line.” 
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The extremes of all rating scales were said 
to represent “feeling fit, rested,” “feeling ex- 
tremely tired, exhausted,” respectively. 


RESULTS 


Seven of the eight Ss were highly surprised 
to hear that the work-load program had been 
varied. One S told the authors that he had 
had some doubts about the constancy of the 
load levels, but apparently had not the faintest 
idea of the system of variation. It may be 
called astonishing that nevertheless Ss gave 
adequate judgments of the intensity of their 
fatigue, conceived as correlations between 
work-load level and rating scale score. 

Table 1 shows the highly significant 
pooled correlation coefficients between “fatigue 
pointer e11,” “fatigue total,” “fatigue noise,” 
and “fatigue line” with work-load level. Fatigue 
pointer t is clearly contaminated by fatigue 
pointer e;;. All individual correlations were 
significant as well, ranging from .54 (p =.01, 
one-tail) to .93 (/=.001), with one excep- 
tion: .26 on fatigue line. 

Variations in intensities of fatigue from day 
to day were experienced as such, of course, 
and expressed by the scores on the different 
rating scales. These variations, however, were 
ascribed to differences in daily working, sleep- 
ing, or smoking habits, etc. (of which non- 
systematic influences were taken for granted), 
and to differences in the physical conditions of 
the experimental room, mostly of the tempera- 
ture (which was actually kept constant). 

From this point of view one might see this 
experiment as an example of so-called “sub- 
liminal perception,” in the sense of Ss being 
unaware of any difference in the presented 
stimuli, but nevertheless being able to react 
in a measurable discriminating way to these 
different stimulus intensities. 

On being asked what Ss considered the 
most reliable way to express their fatigue, 
they preferred the pointer. None of them be- 
lieved the white noise volume to be of any 
value in this respect. This pronouncement 
was another indication that we had to do with 
direct magnitude estimates of fatigue, avoid- 
ing judgments based on irrelevant cues. 

The fatigue pointer gives us an oppor- 
tunity to follow the growth of fatigue during 
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TABLE 1 
INTERCORRELATION MATRIX OF POOLED SPEARMAN’S RANK-DIFFERENCE CORRELATION COEFFICIENTS 
| 
eee e ey eu ty th re r3 ra Ts t | Noise | Line 
_ Pointer 
e5 23% 
e7 23% oe 
9 A2a .42* .66* 
eu 558 oF soe || gen 
Th (.55) E37) E53) ESM) aie OO) 
Yr RL) 14 Pil 18 ES OS T(eS0) 
re 408 | —.01 a2 502 592 | (.59) | —.12 
13 ALS aS 20 ‘s/s A2* | (.42) | —.13 | .19 
Y4 06 19 Pail 19 Pa (.27) | —.05 | .00 —.05 
I5 24 05 | —.04 14 pLOmi GLO) 04 | .06 —.20 | —.27 
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Noise .622 24 30 .49* 3097 (59) eo oe 25 eli 18 .62* 
Line .66 oe .38*| .49* ESO) Pilger. 24 15 JA seh .50* 

















Note.—During r} no decrease in fatigue. 
a The most conspicuous rhos. 
*p < .01, one-tail; rho = .35. 


the exercise, and its wane during the recovery. 
One sees a gradually increasing fatigue from 
the fifth minute of the exercise on. This means 
a quick beginning of discrimination of the 
different slopes in work load. 

With a few exceptions, Ss did not indicate 
a decrease in fatigue during the first 30 sec- 
onds after the stopping of the exercise. On the 
other hand, it is a well-known fact, that physi- 
ological variables show a steep fall during this 
time lapse. This means another example of 
the divergent, at least not directly related, 
modes in which—as was mentioned before in 
the introduction—physiological processes and 
fatigue vary. 

After the first minute of recovery, Ss seemed 
to have little or no information about the 
height of the preceding work load. It is only 
during the second minute of recovery that Ss 
are able to make an adequate estimate. 





WHITE NOISE 
DECIBELS, 


70 60 90 100 110 0 130 
WORK LOAD 
WATTS 


Fic. 1. Linear regression line for final minute’s load 
and fatigue noise. 


e = exercise; r = recovery; t = total; 5, 7, etc. = fifth, seventh, etc., minute. 


Figure 1 shows the observed values along a 
regression line (applying the method of least 
squares), suggesting a linear relationship be- 
tween fatigue and work load, at least for the 
load range considered here, and for the method 
of direct magnitude estimation. The equations 
for the regression lines are: fatigue noise Y = 
.329X + 18, fatigue pointer Y = .275X + 4, 


TABLE 2 


FAcTOR ANALYSIS AFTER VARIMAX ROTATION 
OF THE ORTHOGONAL AXES 











Factor 
Variable 
I II I 
Fatigue pointer 
€5 —.15 +.82 
7 —.33 +.79 
9 = siti +.40 
ei a .86 +.28 
tT : : 
Te = nS 
T3 == AT 
1% 5 
T5 e . 
t —.81 +.24 
Fatigue noise —.52 +.33 
Fatigue line — 37 +.61 
Physiological variables : = 
Note.—Factor loadings <.30 omitted, except where 


meaningful. 
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fatigue line Y = .029X + .5. The slope of the 
regression lines for noise and pointer show 
fairly good discriminating power. In the lower 
ranges the discriminating power tends to be- 
come weaker, as demonstrated by all three 
rating scales. 

Table 2 permits some insight into the pos- 
sible underlying sources of variation operating 
the correlations of the matrix. Factor I clearly 
reflects “increase in fatigue.” Factor II gives 
a reversed picture of Factor I, and may be 
called “decreasing fitness.” Factor loadings on 
the recovery variables are consistent with the 
above observations concerning the second min- 
ute of this period. Factor III represents for 
the greater part changes in physiological vari- 
ables during the exercise, which are not under 
discussion here. The fatigue part of this factor, 
related to the loadings of the same two fatigue 
variables of Factor I, again gives the picture 
of decreasing fatigue with increasing fitness. 

In conclusion, factors have been extracted 
that behave in a complementary way. Con- 
sistent with daily experience, these factors 
seem to reflect the dual nature of expressing 
one’s fatigue: during exercise as an increase 
in tiredness, there is a decrease in fitness; 
during recovery as a decrease in tiredness, 
there is an increase in fitness. 
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WITHIN-UNIVERSITY TRANSFER: 
ITS RELATION TO PERSONALITY CHARACTERISTICS 


CHARLES F. ELTON ann HARRIETT A. ROSE 


University Counseling Service, University of Kentucky 


Personality differences between 43 randomly selected women who remained in 
Arts and Sciences (A&S), 29 women who transferred to Commerce, 55 women 
transfers to Education, and 20 women transfers to either Home Economics or 
Nursing were found by multiple-discriminant analysis to be significant at the .01 


level. Conclusions are: 


(a) Women who remained in A&S were more au- 


thoritarian, practical, and career oriented; (b) Women transfers to either Com- 
merce or Education displayed a more intellectual approach toward scholarliness; 
and (c) Women transfers to either Home Economics or Nursing were less 
inhibited, more socially comfortable, and less conforming. 


The study of characteristics of students 
transferring within university programs has 
received only minor attention in the research 
literature. Pierson (1962), for example, re- 
ports that 70% of Michigan State University 
seniors changed from their original division or 
college of enrollment. Cook (1965), in a simi- 
lar report, indicates the frequency of transfer 
within Auburn University. Holland (1962) 
found that 80% of students in large state 
universities changed major fields over a 2-year 
period. Parkinson (1964) discovered that 
students at Miami University who transferred 
from the College of Arts and Sciences (A&S) 
to the School of Education tended to be below 
the fiftieth percentile on the American Coun- 
cil on Education Psychological Examination 
(ACE). Bradley (1962) shows sex and grade- 
point differences on the measures of academic 
aptitude, critical thinking, beliefs, and values 
between Michigan State University students 
who made intrauniversity transfer and those 
who did not. The high incidence of within- 
university transfer and the lack of substantial 
knowledge about this phenomenon suggests 
that this is an area worthy of investigation. 
This study examines the aptitude and per- 
sonality characteristics of female students who 
transfer from the College of A&S to other 
colleges within the University of Kentucky. 


PROCEDURE 


The Omnibus Personality Inventory (OPI) 
is routinely administered to all entering Uni- 
versity of Kentucky freshmen (OPI Re- 
search Manual, 1962). The scores on the 16 


scales of this inventory for the freshman 
classes of 1962 through 1964 were factor 
analyzed by the method of principal com- 
ponents on an IBM 7040 computer and sub- 
jected to a varimax rotation to extract five 
factors. In addition, the composite American 
College Test (ACT) score is available for 
each student who entered during these years. 

The student population studied consisted 
of all female freshman students who entered 
the College of A&S during the academic 
years of 1963-64 and 1964-65 and transferred 
to another college within the University of 
Kentucky during their first three semesters. 
Five OPI factor scores and the ACT com- 
posite score constituted the independent vari- 
ables in a multiple-discriminant analysis. The 
dependent variables consisted of the following 
four groups: 29 students who transferred to 
the College of Commerce; 55 students who 
transferred to the College of Education; 20 
students who transferred to either Nursing or 
Home Economics, and 43 students chosen by 
a table of random numbers from those who 
remained in the College of A&S. 


RESULTS 


The factor loadings for the varimax rota- 
tion of the OPI are presented in Table 1. The 
five factors are identified as: (1) Tolerance 
and Autonomy, (II) Suppression-Repression, 
(IIIT) Masculine Role, (IV) Scholarly Orienta- 
tion, and (V) Social Introversion. For a full 
discussion of the interpretation of these factors 
see OPI Manual (1962). The major difference 
between the California and Kentucky factor 
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TABLE 1 
VARIMAX ROTATION OF OPI—PRINCIPAL 
AxEs LOADINGS 

Scale I II IIT IV V hh 
TI 15 18 | —23 87 | —08 86 
TO 17 09 Shi 82 01 85 
ES 10 | —14 | —61 61 | —07 id 
CO 46 | —36 | —O1 53 | —12 64 
AU 87 07 | —06 19 01 81 
DS 83 | —40 09 10 | —04 86 
IE 41 | —77 22 06 | —25 88 
SF 02 | —86 | —16 01 36 89 
SI 08 | —18 20 | —10 89 89 
RL 68 | —18 19 | —02 05 54 
SM 79 06 | —16 53 | —06 93 
MF —00 Uh 94. 04 14 90 
RS —04 93 12 03 | —20 92 
NA 69 06 | —10 08 08 50 
LA 03 71 34 04 | —42 79 
CK 00 | —83 05 | —09 | —12 72 








Note.—N = 6,086. 


structure is the reversal between Factors III 
and IV in the order of extraction. That is, 
Scholarly Orientation, Factor III for the 
California student group, was Factor IV for 
the Kentucky students. This may be explained 
partially by ability differences between the 
two student populations. The California norm 
group consisted of 2,109 students from the 
University of California and 281 students from 
San Francisco State College. This combined 
group represents the upper one third of the 
ability level of California students. The Ken- 
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tucky student population is more heterogene- 
ous because of a nonselective admission policy 
for instate students. 

Table 2 presents the means, univariate F 
tests, and scaled vectors for the six inde- 
pendent variables. 

The discriminating power of the prediction 
was determined by the computation of Wilks’ 
lambda which was significant (A = .74, F= 
2.48, df = 18/390, p = .01). Chi-square tests 
were computed for each of the two discrimi- 
nant functions to determine the significance 
of discrimination along each dimension sepa- 
rately (Rao, 1952, pp. 370-378). The first 
function accounted for 62% of the variance 
and was significant at the .001 level; the 
second discriminant function accounted for 
30% of the variance and was significant at 
the .05 level. Duncan’s multiple-range test 
(Freund, Livermore, & Miller, 1960) was em- 
ployed to discover which group centroids were 
significantly different from each other on the 
two functions (Table 3). The results indicate 
that Function I differentiates students re- 
maining in A&S from transfers to Commerce 
(.01 level), from transfers to Education (.01 
level), and from transfers to Nursing or Home 
Economics (.05 level). Also students trans- 
ferring to Nursing or Home Economics differ 
significantly from those going to either Com- 
merce or Education (.05 level). Transfers to 
Commerce and Education do not differ from 
each other significantly on Function I. 


TABLE 2 
Means, UNIVARIATE F TEstTs, AND SCALED VECTORS 




















Means 
A&S to Scaled vectors 
Variable AX&S to A&S to Home A&S ie 
Commerce Education Economics Remain 
(WV = 29) CVi=555) or Nursing | (NV = 43) | ca 
(N = 20) I II 
ACT Composite 21.48 2ARS il 22.30 21.91 58 — .38 24 
Tolerance & Autonomy 54.86 53.78 52.90 48.74 2.79* 03 | —.44 
Suppression-Repression 48.00 51.29 44.85 48.53 2.45 —.40 | —.70 
Masculine Role 53.34 57.71 55.705 58.70 3.25" 18 | —.10 
Scholarly Orientation 56.21 54.71 52.50 45.67 8.09* —77 aah 
Social Introversion 51.48 Siete 50.20 54.09 1.41 —.25 —.Al 
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TABLE 3 
CENTROIDS AND VARIANCES IN REDUCED SPACE 


————— 





Function I Function II 
Transfer — — - 
Centroid | Variance | Centroid | Variance 

1&S to Commerce —72.53 53.40 —63.65 50.89 
'&S to Education —71.78 42.06 —66.22 30.40 
v&S to Home Eco- —68.03 58.99 —61.09 18.48 
‘nomics or Nursing 

—64.65 71.84 —65.70 29.40 


&S Remain 





Function II differentiates students trans- 
arring to Nursing and Home Economics from 
nose going to Education or those remaining in 
.&S at the .01 level. All other comparisons on 
ais function lack significance. These rela- 
lonships are clarified by the plot of the group 
entroids in Figure 1. 

An inspection of the scaled weights (Table 
') indicates that the predictors providing the 
irgest contribution to Function I are: Schol- 
rly Orientation, Suppression-Repression, and 
.CT. Largest contribution to Function IT is 
nade by Tolerance and Autonomy, Suppres- 
ion-Repression, and Social Introversion. The 
mergence of ACT, Social Introversion, and 
uppression-Repression (Table 2) as high 
ontributors to prediction serves to demon- 
trate the value of multivariate analysis. The 
mivariate F tests for all of these variables 
vere insignificant yet the scaled vectors reveal 
heir relative importance in differentiation. 
‘or a full discussion of this phenomenon see 
“ooley and Lohnes (1962, p. 121). 


FUNCTION I 
-70 -69 -68 -67 -66 -65 -64 
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Fic. 1. Group centroids in discriminant space. 
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Table 2 reports the findings of an analysis 
of the significance of the differences among 
the individual predictor means for the four 
groups, as determined by Duncan’s multiple- 
range test. The mean Tolerance and Autonomy 
score for students remaining in AW&S is sig- 
nificantly lower (.05 level) than the mean 
scores for students transferring to Commerce 
or Education. The mean Masculine Role score 
for students transferring to Commerce is sig- 
nificantly lower (.05 level) than the mean 
scores for students transferring to Education 
or those remaining in A&S. The mean Schol- 
arly Orientation score for students remaining 
in A&S is significantly lower (.05 level) than 
the mean scores for the other three groups. 


DISCUSSION 


On Function I (Table 2) the combination 
of low Scholarly Orientation, Suppression-Re- 
pression, and ACT scores produces a high 
centroid score (Table 3) for students remain- 
ing in A&S. This function separates the 
groups along a continuum best described as 
attitude toward academic and_ intellectual 
striving. 

Function II (Table 2), on the other hand, 
is dominated by negative loadings on Toler- 
ance and Autonomy, Suppression-Repression, 
and Social Introversion. These variables sug- 
gest a differentiation of the groups along a 
dimension of attitude toward social and en- 
vironmental characteristics. 

An examination of Figure 1 indicates that 
students who remain in A&S can be identified, 
students who transfer to Home Economics or 
Nursing can be identified, students who trans- 
fer out of A&S can be identified in comparison 
to those who remain, but that no significant 
difference is found between those who transfer 
to Education and those who go to Commerce. 

The interpretation of the two discriminant 
functions may be clarified by a description of 
the personality patterns of the students. These 
patterns are suggested by the significant mean 
differences obtained between the groups of the 
predictor variables. 

Students who remain in A&S achieve the 
highest centroid on Function I. The high 
scorers on this function are low scorers on 
the OPI factors of Scholarly Orientation and 
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Suppression-Repression as well as on the ACT 
composite score. This would characterize stu- 
dents who remain in the college as those who 
are uncomfortable with reflective thought of 
an abstract nature; who have little interest in 
literature, art, and philosophy; who evaluate 
ideas on the basis of their practical applica- 
tion; whose attitudes toward academic or 
intellectual orientation is more like that con- 
sidered by our culture to be masculine-—that 
is, nonaesthetic and utilitarian. While Func- 
tion II does not significantly differentiate this 
group, Tolerance and Autonomy, which loads 
heavily on Function II, is one of the pre- 
dictors on which the mean score earned by 
those who remain in A&S is significantly 
lower than the score of the other three groups. 
This would indicate that these students have 
more dependence on authority and do not 
rebel against the strictures imposed by family, 
church, state, or the A&S faculty; they do 
not protest the infringement of individual 
rights; they are inflexible, intolerant, and un- 
realistic in their dependence on rules, rituals, 
and authority for managing their social rela- 
tionships; they tend toward immaturity and 
they are religious, conventional, rigid, preju- 
diced, and emotionally suppressed. They re- 
semble women who choose the physical sciences 
in the masculinity of their approach, that is, 
career first, with a goal of technical contribu- 
tion to science, marriage second (Abe & Hol- 
land, 1965). 

Women who transfer to Home Economics or 
Nursing seem to be characterized by better 
social adjustment; a tendency to be unin- 
hibited, less cautious, and less rational than 
the other groups; and freedom to express a 
variety of impulses and anxieties without 
regard to social amenities. These findings tend 
to corroborate those in which women who 
choose the health professions report themselves 
to be somewhat lacking in self-control (Abe & 
Holland, 1965). 

Women who transfer to Commerce or Edu- 
cation achieve the lowest scores on Function I. 
This score is derived primarily from a high 
score on Scholarly Orientation, which indi- 
cates an interest in thinking and dealing with 
abstractions and an appreciation of freedom 
of thought, that is, a more intellectual ap- 
proach to education. They appear to be more 
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mature in their appreciation for the needs o 
others. 

Ability differences between female student 
who remain in A&S and those who transfe 
out of that college are insignificant. This find 
ing may be accounted for, in part, by the fac 
that faculty rules at the University of Ken 
tucky require students to possess a C averag 
or higher in order to transfer to another col 
lege. Nevertheless, this result suggests a con 
tradiction to the commonly held assumptio 
that students leaving a liberal arts progran 
are less able than those who remain in it. 

Is change in major field a function of th 
attributes of student or institutional charac 
teristics, as Holland (1962) asked? Within 
university transfer appears to be a functio1 
of the interaction of student personality trait 
and college characteristics. Whenever studen 
personality traits come into conflict with th 
college environment it is likely that transfe 
out of that environment will result. Certail 
inferences can be drawn about college at 
tributes by studying the differences in per 
sonality traits between students who elect t 
remain with their original majors and student. 
who elect to transfer. Thus, it may be hy 
pothesized that instruction in freshman course 
in the College of A&S at the University o 
Kentucky appeals most to the authoritaria1 
and practical student. 

This study is an initial effort to assess th 
importance of student personality variables i 
within-university transfer. Replication of th 
study as well as certain refinements are neces 
sary before valid generalizations are possible 
Although Astin (1965a, 1965b) assessed col. 
lege and classroom environments, there is < 
need to relate college environments to student 
personality traits for those involved in change: 
in major fields. Do students who change 
majors only once differ from students wh«c 
change programs more than once? Do student: 
who make the decision to change programs at 
the junior level possess different attribute: 
from those who change earlier? 
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Results of an empirical test of the Herzberg 2-factor theory of job satisfaction 
are reported. A number of hypotheses for which the Herzberg theory and tradi- 
tional unidimensional theory make different predictions were tested using a 
sample of 793 male employees from various jobs. The intrinsic variables (“satis- 
fiers”) were the work itself and promotions, and the extrinsic variable “dis- 
satisfier”) was pay. Neither the Herzberg theory nor the traditional theory was 
supported by the data. Instead, results indicate that intrinsic factors are more 
strongly related to both overall satisfaction and overall dissatisfaction than the 
extrinsic factor, pay, and suggest that functioning of the extrinsic variable may 
depend on the level of satisfaction with the intrinsic variables. It was con- 
cluded that the concepts of “satisfiers” and “dissatisfiers’ do not accurately 
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represent the manner in which job-satisfaction variables operate. 


The two-factor theory of job satisfaction 
proposed by Herzberg, Mausner, and Snyder- 
man (1959) several years ago has occasioned 
considerable controversy. Briefly, this theory 
states that certain variables in the work situ- 
ation (“satisfiers”) lead to overall job satis- 
faction, but play an extremely small part in 
producing job dissatisfaction; while other 
variables (‘‘dissatisfiers’”) lead to job dis- 
satisfaction but do not in general lead to job 
satisfaction. The Herzberg study cited the 
factors of work itself, responsibility, and ad- 
vancement as the major satisfiers, and com- 
pany policy and administration, supervision 
(both technical and interpersonal relation- 
ships), working conditions, and pay as the 
major dissatisfiers. These findings are, of 
course, in direct opposition to the traditional 
idea that if the presence of a variable in the 
work situation leads to job satisfaction, then 
its absence will lead to job dissatisfaction, 
and vice versa. These findings also contradict 
the findings of Herzberg, Mausner, Peterson, 
and Capwell (1957) based on an extensive 
review of the literature; for example, compare 
Herzberg et al. (1957, p. 48) with Herzberg 
et al. (1959, p. 81). 

Research relevant to the Herzberg (1959) 
theory has produced conflicting results. On 
the positive side, Schwartz, Jenusaitis, and 


Stark (1963), using supervisory personnel in 
public utility industries, supported Herzberg’s 
findings. Myers (1964), using employees on 
five different industrial jobs, also replicated 
Herzberg’s results. Similarly Dysinger (1965), 
using civilian scientists and engineers in vari- 
ous Army research and development (R&D) 
installations, supported Herzberg’s results 
using an incident check-list technique rather 
than an interview technique. Saleh (1964) 
also claimed to have supported the Herzberg 
hypothesis, although (as he observed) the re- 
sults were not entirely clear-cut. Also, Herz- 
berg (1965) replicated his earlier findings 
using a sample of Finnish supervisors. 

All these studies, however, had in common 
the fact that they used the same recall 
method: subjects (Ss) first recalled instances 
of previous satisfaction and dissatisfaction 
and then described or checked the events which 
they perceived as leading up to or causing 
each instance. Inasmuch as Ewen (1964) and 
Dunnette and Kirchner (1965, pp. 152-153) 
among others have pointed out possible draw- 
backs of this method (e.g., selective bias in 
recall and projection of individual failure onto 
external sources), and since Hardin (1965) 
has reported evidence which makes studies 
which rely upon retrospective accounts of sat- 
isfaction extremely suspect, replications ob- 
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tained by this method alone cannot be re- 
garded as giving unequivocal support to the 
‘Herzberg theory. A different method was used 
‘by Halpern (1965) who had Ss rate their 
“Yest-liked job” on four “satisfiers” and four 
“dissatisfiers” as well as overall satisfaction. 
‘He found that the scores on the satisfiers cor- 
related more highly with overall satisfaction 
than the scores on the dissatisfiers. Unfor- 
‘tunately, this study did not deal with the 
other half of the theory (dissatisfaction) at all. 
_ On the negative side, Wernimont and Dun- 
nette (1964) compared results using the Herz- 
berg method to results using a forced-choice 
checklist method of indicating the causative 
factors in satisfaction and dissatisfaction. 
Using engineers and accountants as Ss, they 
found that with the forced-choice method, the 
satisfiers were endorsed more often to account 
for both satisfying and dissatisfying situa- 
tions. On the other hand, in the free-choice 
situation the results replicated those obtained 
by Herzberg, thus supporting the notion that 
a free-choice situation may encourage bias in 
recall. 

A unique variation on the Herzberg method 
was used by Lindsay (1965). Instead of hav- 
ing his Ss (employees of an R&D company) 
first recall affective incidents and then their 
alleged causes, he had Ss first think of job 
factors (e.g., success experiences, company 
policy changes) and then the attitudes these 
experiences produced; in other words, the 
exact reverse of the Herzberg order. Using one 
satisfier (achievement) and one dissatisfier 
(company policy) Lindsay found that the 
satisfier accounted for three times as much 
variance in overall job satisfaction as the 
dissatisfier (i.e., it produced both more satis- 
faction and more dissatisfaction than the dis- 
satisfier), thus agreeing with Wernimont and 
Dunnette’s (1964) findings using the forced- 
choice format. Friedlander (1964) came to the 
same conclusion using still a different method. 
He had Ss rate the importance of various fac- 
tors as to their perceived importance in pro- 
ducing satisfaction and dissatisfaction. Again 
the “satisfiers” (e.g., achievement, recogni- 
tion) were rated more important than the 
“dissatisfiers” in producing both satisfaction 
and dissatisfaction. 

In a factor-analytic study, Friedlander 
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(1963) did not obtain a general intrinsic and 
a general extrinsic factor as the Herzberg 
theory would suggest. Instead, factors of 
social and technical environment, intrinsic 
self-actualizing work, and recognition through 
advancement were obtained. Ewen (1963, 
1964) investigated a sample of approximately 
1,000 life insurance agents and found that 
various job factors did not for the most part 
act in the manner predicted by the Herzberg 
theory. Dunnette (1965) studied samples of 
executives, sales clerks, secretaries, scientists 
and engineers, salesmen, and army reserves 
and supervision students, and concluded that 
the two-factor theory was an oversimplifica- 
tion. He stated that job satisfaction was 
multidimensional, and the same factors were 
able to contribute to both satisfaction and 
dissatisfaction. Graen (1965) performed a 
factor analysis using groups of engineers and 
found that Herzberg’s a priori satisfaction 
dimensions did not emerge as clear factors. 
He therefore concluded that a priori theorizing 
is no substitute for empirical verification in- 
sofar as determining the factors of job satis- 
faction is concerned. Malinovsky and Barry 
(1965) investigated a sample of blue-collar 
workers and found that, contrary to the Herz- 
berg theory, both satisfiers and dissatisfiers 
were positively related to job satisfaction. 
There exist several straightforward hy- 
potheses for which the Herzberg two-factor 
theory and the traditional unidimensional 
theory make diametrically opposed predic- 
tions. By subjecting these hypotheses to em- 
pirical tests, it should be possible to obtain 
evidence which will provide an indication as 
to the relative merits of the two theories. 
These hypotheses are enumerated below. 
Hypothesis 1. Suppose there is one group of 
employees which is neutral (neither satisfied 
nor dissatisfied) with regard to the alleged 
satisfiers, and a second group which is dis- 
satisfied with these alleged satisfiers. Suppose 
further that both groups are equated regard- 
ing their satisfaction with the alleged dis- 
satisfiers. The Herzberg theory would predict 
that the two groups would be equal in overall 
job satisfaction, since being dissatisfied with 
the satisfiers is assumed to be no worse than 
being neutral with regard to the satisfiers— 
dissatisfaction with satisfiers should not lead 
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to job dissatisfaction. However, the tradi- 
tional theory would predict that the first group 
(neutral with regard to the satisfiers) would 
be more satisfied than the second group (dis- 
satisfied with regard to the satisfiers), since 
this theory postulates that dissatisfaction with 
any variable tends to lead to overall job dis- 
satisfaction. 

This hypothesis may be tested indepen- 
dently three times with the data. Three sepa- 
rate groups can be formed, matched on degree 
of satisfaction with the dissatisfiers: a group 
dissatisfied with the dissatisfiers (Group A), 
a group neutral with regard to the dissatisfiers 
(Group B), and a group satisfied with the 
dissatisfiers (Group C). Within each group 
can be compared those who are neutral on the 
satisfiers with those who are dissatisfied with 
the satisfiers. The Herzberg theory would 
predict no difference between these subgroups, 
whereas the traditional theory would predict 
those neutral on the satisfiers would be more 
satisfied than those dissatisfied on the satis- 
fiers. The three tests of Hypothesis 1 will be 
called Tests 1A, 1B, and 1C, the letters 
corresponding to the above groups. 

Hypothesis 2. A parallel hypothesis may be 
made for the effects of the dissatisfiers. This 
time satisfaction with the satisfiers is held 
constant by selecting three separate groups: 
a group dissatisfied with the satisfiers (Group 
A), a group neutral with regard to the satis- 
fiers (Group B), and a group satisfied with 
the satisfiers (Group C). The Herzberg theory 
would predict no difference within each group 
between those neutral with regard to the dis- 
satisfiers and those satisfied with the dissatis- 
fiers. The traditional theory would predict a 
significant difference between the two sub- 
groups. Again the three tests of the hypothesis 
will be called 2A, 2B, and 2C with the letters 
corresponding to the above groups. 

Hypothesis 3. The Herzberg theory would 
predict that being dissatisfied with a dis- 
satisfier should lead to greater overall dis- 
satisfaction than being dissatisfied with a 
satisfier. Traditional theory would predict no 
such difference. Thus, suppose there are the 
following two groups: Group A is dissatisfied 
with the satisfiers and neutral with regard to 
the dissatisfiers, and Group B is neutral with 
regard to the satisfiers and dissatisfied with 
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the dissatisfiers. The Herzberg theory would 
predict that Group A would show higher over- 
all job satisfaction than Group B, while the 
traditional theory would not predict any dif- 
ference between the two groups. 

Hypothesis 4. Similarly, the Herzberg theory 
would predict that being satisfied with a 
satisfier should lead to greater overall satis- 
faction than being satisfied with a dissatisfier. 
Thus, suppose there are the following two 
groups: Group A is satisfied with the satisfiers 
and neutral with regard to the dissatisfiers, 
and Group B is neutral with regard to the 
satisfiers and satisfied with the dissatisfiers. 
The Herzberg theory would predict that Group 
A would show higher overall satisfaction than 
Group B, while traditional theory would not 
predict any difference between the two groups. 

The purpose of the present paper is to test 
the various hypotheses stated above. Job-satis- 
faction instruments other than the one used 
in the Herzberg study will be used so as to 
avoid the possibility (Ewen, 1964) that the 
critical incidents method used in the Herz- 
berg study affects the nature of the obtained 
results. 


MrtTHOD 


Subjects 


Briefly, the original sample from which these Ss 
were drawn consisted of 1,978 males, randomly se- 
lected from the lists of employees 35 years of age 
and older (25 in one company) in 21 “units.” These 
units had been selected to form a sample of indus- 
trial and business organizations (local) employing 50 
or more persons, stratified so as to obtain widely 
different sizes and policies. The sample of Ss varied 
greatly according to job level, age, educational back- 
ground, experience, place of employment, and other 
relevant characteristics. For details of the sample, see 
Kendall (1963). As will be explained below, only 
793 of these Ss were suitable for use in the tests of 
the various hypotheses. 


Instruments 


In view of its extensive validation, the Job De- 
scriptive Index (JDI) developed at Cornell Univer- 
sity (Hulin, Smith, Kendall, & Locke, 1963; Kendall, 
Smith, Hulin, & Locke, 1963; Locke, Smith, Hulin, 
& Kendall, 1963; Locke, Smith, Kendall, Hulin, & 
Miller, 1964; Macaulay, Smith, Locke, Kendall, & 
Hulin, 1963; Smith, 1963; Smith & Kendall, 1963) 
was used as the measure of job satisfaction. Vroom 
(1964) has called this measure “without doubt the 
most carefully constructed measure of job satisfac- 
tion in existence today [p. 100].” The JDI is an 
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-adjective checklist dealing with five areas of the job: 
‘the work itself, supervision, people, pay, and promo- 
‘tions. While the JDI does not deal with all of the 
‘satisfiers and dissatisfiers used in the Herzberg study, 
‘it was considered preferable to other instruments 
‘which while measuring more factors were of less 
‘well-substantiated validity. 

_ The General Motors Faces Scale (Kunin, 1955) was 
‘used as the measure of overall job satisfaction. This 
‘measure is a one-item graphic scale, consisting of 
‘six faces varying from a large smile to a large 
‘frown. The S is asked to check the face which most 
‘closely represents his feelings toward his job-in- 
general. This particular scale has not been validated, 
‘but faces scales for particular job-satisfaction di- 
‘mensions have previously shown good discriminant 
-and convergent validity (Locke et al., 1964). 


Procedure 


The JDI assesses five aspects of job satisfaction: 
‘the work itself, pay, promotional opportunities and 
policies, co-workers, and supervision. Inasmuch as 
‘the co-workers variable was neither a major satisfier 
‘nor a major dissatisfier in the Herzberg study, this 
‘variable was excluded from the analysis. The super- 
vision variable was also excluded since the JDI does 
not distinguish between such factors as recognition 
given by the supervisor (supposedly a satisfier) and 
the technical aspects of supervision (supposedly a 
dissatisfier). Thus, three factors remained: the work 
itself and promotions, supposedly satisfiers; and pay, 
supposedly a dissatisfier. 

From the original sample of 1,978 Ss, the groups 
defined by the various hypotheses were formed. 
Only those Ss were used whose general level of 
satisfaction on the two satisfiers was approximately 
equal. Were this not done (e.g., if Ss were included 
who were satisfied with regard to the work itself 
but dissatisfied with regard to promotions), it would 
not be possible to form meaningful conclusions about 
the hypotheses. That is, the effects of one of the 
satisfiers might or might not be cancelled out by 
opposite effects of the other satisfier. By equating 
for satisfaction on the two satisfiers (work itself 
and promotions), this problem was eliminated. A 
fairly large number of Ss differed with regard to 
satisfactions concerning the two satisfiers. When 
these were eliminated, a total of 837 remained. One 
group of 44 Ss (those satisfied with the satisfiers 
and dissatisfied with the dissatisfier) was not used 
in any of the tests. Thus, the final number of Ss 
was 793. 

In order to determine which Ss were satisfied, 
which Ss were neutral, and which Ss were dis- 
satisfied with each of the variables in question, it 
was necessary to determine neutral points for each 
JDI scale. This had been done in a previous study 
(Ewen, 1965). The procedure involved obtaining 
data for two groups of workers (not those used in 
the present study), consisting of JDI scores and 
Faces Scale scores for each of the five components 
(work, supervision, people, pay, and promotions). 
For each JDI scale and corresponding Faces Scale, 
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a linear-regression analysis was carried out, and the 
neutral point of any JDI scale was taken to be the 
point on the regression line that corresponded to the 
neutral face on the Faces Scale. This was done 
separately for each of the two groups, and the 
results of the two groups were in high agreement. 
Any S with a score within five points above or below 
the neutral point on a JDI scale was taken to be 
neutral with regard to the component measured by 
that scale. 


RESULTS 


The results of the various tests are shown in 
Table 1. The data regarding Tests A, B, and 
C of Hypothesis 1 clearly support the tradi- 
tional theory and argue against the Herzberg 
theory. The results indicate that dissatisfac- 
tion with the satisfiers (work itself and promo- 
tions) does lead to overall dissatisfaction. 
That is, groups neutral with regard to the 
satisfiers showed significantly higher overall 
satisfaction than those dissatisfied with the 
satisfiers. The results were highly significant 
in all three cases, indicating that the same 
conclusion holds whether we consider people 
who are dissatisfied with the dissatisfier (pay), 
neutral with regard to the dissatisfier, or satis- 
fied with the dissatisfier. In each case, those 
neutral with regard to the satisfiers showed 
significantly higher overall job satisfaction 
than those dissatisfied with the satisfiers. 

The data regarding Tests A and B of Hy- 
pothesis 2 support the Herzberg theory and 
argue against the traditional theory. That is, 
being satisfied with the dissatisfier led to no 
more overall satisfaction than being neutral 
with regard to the dissatisfier for those Ss who 
were either dissatisfied or neutral with regard 
to the satisfiers. However, the data concerning 
Test C support the traditional theory rather 
than the Herzberg theory. That is, for people 
who are satisfied with the satisfiers, being 
satisfied with the dissatisfier led to greater 
overall satisfaction than being neutral with 
regard to the dissatisfier. 

The data regarding Hypothesis 3 do not 
support either theory. The Herzberg theory 
predicts that being dissatisfied with a dis- 
satisfier should lead to more overall dissatis- 
faction than being dissatisfied with a satisfier. 
However, the data indicate that the opposite 
is true. Being dissatisfied with the satisfiers 
led to more overall dissatisfaction than being 
dissatisfied with the dissatisfier. The tradi- 
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TABLE 1 
PREDICTIONS MADE BY THE HERZBERG THEORY AND THE TRADITIONAL THEORY 
AND RESULTS CONCERNING THESE PREDICTIONS 
Satisfaction with ee Predictions: 
Hypoth-|_, ' Theory 
esis Se ee Satis- Dissatis- r supported* 
fiers fiers N BXG H ‘L 

1 A 1 Ds D 71 1.77> 1=4 1<4 3.69** oe 
4 N D 20 2.95 

1 B 2 D N 29 2.24 Das D5 4,38** aE 
5 N N 28 3.54 

1 c 3 D S 33 2.33 3=6 3<6 4.30** Ts 
6 N S 47 SoZ 

y A y D N 29 2.24 ws 2<3 —<l H 
3 D S So) Das 

2 B 5 N N 28 3.54 = 6 56 | el H 
6 N S 47 3.32 

2 G 8 5 N 94. Sai 8=9 8<9 Se One E 
9 S S 471 4.11 

5 2 D N 29 2.24 4<2 7, Dao ? 
4 N D 20 2.95 

4 6 N S 47 3.32 6<8 6=8 1.95* H 
8 S N 94 Sal 























Note.—Group 7, which consisted of subjects satisfied with the satisfiers and dissatisfied with the dissatisfier, was not used in 


any of the tests of the hypotheses. 
8D = dissatisfied; N = neutral; S = satisfied. 
b High mean scores indicate high overall job satisfaction. 
eH = Herzberg theory; T = traditional theory. 
*p <.05. 
ED < 001. 


tional theory would not predict any difference 
between these two groups. The data regarding 
Hypothesis 4, on the other hand, support the 
Herzberg theory. Being satisfied with a satis- 
fier led to more overall satisfaction than being 
satisfied with a dissatisfier. 


DIscussION 


It is clear that the present results taken as 
a whole do not provide clear support for either 
the Herzberg theory or for the traditional 
theory. Some of the results favor one theory 
while other results favor the other theory. 

Though the results may appear to be con- 
tradictory, a logical explanation does exist. 
All of the various results form a consistent 
pattern if it can be assumed that the satis- 
fiers used in this study, the work itself and 


The N for this group was 44, and the mean overall satisfaction was 3.52. 


promotions, are more potent variables than 
the dissatisfier, pay. Thus, the results con- 
cerning Hypothesis 1 indicate that the satis- 
fiers can serve as strong sources of dissatisfac- 
tion (as well as strong sources of satisfaction). 
The results concerning Hypothesis 2 indicate 
that satisfaction with pay is insufficient to 
significantly increase overall satisfaction if 
satisfaction with the more potent variables, 
the satisfiers, is at a low or neutral level. 
However, for those who are satisfied with the 
satisfiers, satisfaction with pay can increase 
overall satisfaction. The results concerning 
Hypotheses 3 and 4 indicate that satisfaction 
with the satisfiers leads to greater overall 
satisfaction than satisfaction with the dis- 
satisfier, while dissatisfaction with the satis- 
fiers leads to greater overall dissatisfaction 


HERZBERG Two-Factor THEORY 


‘than dissatisfaction with the dissatisfier. This 
interpretation supports the previous findings 
sof Friedlander (1964), Lindsay (1965), and 
Wernimont and Dunnette (1964) that in- 
‘trinsic factors are the most important sources 
‘of both satisfaction and dissatisfaction. 

It must be kept in mind that the present 

ystudy dealt with only three of a large num- 
‘ber of factors that affect job satisfaction. 
Wariables other than the work itself, promo- 
‘tions, and pay play a part in producing 
‘overall satisfaction and dissatisfaction. Also, 
‘the theory that the intrinsic factors are the 
‘most important variables was formulated after 
‘the results were in. A careful reading of the 
previous research should have indicated to us 
‘the desirability of explicitly testing this the- 
sory; but nevertheless the fact remains that 
ithe present study was not specifically designed 
jwith the intention of testing this theory. 
_ In spite of the above limitations, the 
present results taken in conjunction with 
those of Friedlander (1964), Halpern (1965), 
(Lindsay (1965), and Wernimont and Dunnette 
(1964) strongly suggest that the intrinsic 
factors are in fact the most potent factors in 
the work situation in terms of their relation- 
ship to overall job satisfaction. The results of 
the present study suggest that the manner in 
which the extrinsic factors operate may de- 
pend on the level of satisfaction with the 
intrinsic factors. This latter finding is only 
tentative, however, inasmuch as only one 
extrinsic factor (pay) was used in the present 
study. 

It should be noted that traditional theory 
in no way precludes the possibility of some 
areas of job satisfaction being more potent 
than other areas. In fact, it would be rather 
startling if all areas were equally potent. The 
unidimensionality of the intrinsic variables in 
the present study clearly supports traditional 
theory. The next question is whether intrinsic 
variables are simply more potent than ex- 
trinsic variables, or (as the present results 
suggest may be the case) whether the func- 
tioning of the extrinsic variables depends on 
the level of satisfaction with the intrinsic 
variables. If subsequent research indicates 
that the latter is the case, a modification of 
traditional theory would be in order. 

Further research is necessary to determine 





549 


the generality of the present results. The 
question arises as to whether the present 
findings will be replicated in new situations, 
and at all job levels, whether or not these 
findings hold up when sources of job satisfac- 
tion other than those used in this study are 
used, and whether the way in which job-satis- 
faction variables operate is related to such 
variables as age, tenure, and job level. The 
weight of the evidence to date indicates that 
the concepts of “satisfiers” and “dissatisfiers” 
are misleading and do not accurately indicate 
the way in which job-satisfaction variables 
affect overall job satisfaction. If the results 
of the present study prove to be of general 
applicability, the functioning of the various 
factors might better be described by such 
terms as “primary satisfaction variables” (i.e., 
those variables which are strong sources of 
both overall satisfaction and overall dissatis- 
faction) and “secondary satisfaction vari- 
ables” (ie., those variables the nature of 
whose operation depends on the level of satis- 
faction with the primary satisfaction varia- 
bles). For the present, however, the “intrinsic” 
and “extrinsic” classifications would appear to 
be preferable to the “satisfier-dissatisfier” 
terminology. 
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ADDENDUM TO “AN EMPIRICAL TEST OF THE 
HERZBERG TWO-FACTOR THEORY”? 


GEORGE B. GRAEN 


University of Minnesota 


A 2-way analysis of variance on selected a priori contrasts was performed on 
the data from the study by Ewen, Smith, Hulin, and Locke (1966). The results 
clearly support the traditional theory without the assumption that all variables 
are equally potent contributors to job satisfaction and argue against the 2- 
factor theory. The contribution of the satisfier was demonstrated to be much 
greater than that of the dissatisfier to both satisfaction and dissatisfaction. It 
was concluded that the distinction between satisfiers and dissatisfiers is no 
longer reasonable. Further, the most likely candidate on which to make the 
distinction between more and less potent contributors to job satisfaction appears 
to be the intrinsic and extrinsic classification. 


The study by Ewen, Smith, Hulin, and 
Locke (1966) was designed as an empirical 
test of the Herzberg, Mausner, and Snyder- 
“man (1959) two-factor theory of work motiva- 
tion. In the Herzberg study the variables of 

Achievement, Advancement, Recognition, Re- 
sponsibility, and Work Itself were classified 
as “satisfiers” and assumed to contribute 
mainly to job satisfaction. In contrast, the 
variables of Company Policies and Practices, 
Pay, Supervision (both technical and human 
relations), and Working Conditions were clas- 
sified as ‘‘dissatisfiers’” and assumed to con- 
tribute almost exclusively to job dissatisfac- 
tion. 

Ewen and his associates contrasted the two- 
factor theory with what they called the “tra- 
ditional idea.” The traditional idea was that 
if the presence of a variable contributed to 
satisfaction, then the absence of that variable 
would contribute to dissatisfaction and vice 
versa. After the authors presented a most ex- 
tensive review of the research relevant to the 
two-factor theory, they stated four hypotheses 
on which the two-factor theory and the tradi- 
tional theory make opposite predictions. These 
hypotheses are restated below. 

Hypothesis 1. The two-factor theory pre- 
dicts that if there is one group of employees 

1 The author wishes to express his appreciation to 
Robert B. Ewen, Patricia Cain Smith, Charles L. 
Hulin, and Edwin A. Locke for providing the data 
and to Marvin D. Dunnette for his cooperation and 
support. Part of the research reported here was sup- 
ported by a Behavioral Science Research Grant from 
the General Electric Foundation. 
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which is neutral with regard to the satisfiers 
(Group A) and a second group which is dis- 
satisfied with the satisfiers (Group B), then 
the two groups would be equal in overall job 
satisfaction. In contrast, the traditional theory 
would predict that Group A would be more 
satisfied than Group B. 

Hypothesis 2. The two-factor theory pre- 
dicts that if there is one group of employees 
which is neutral with regard to the dissatis- 
fiers (Group C) and a second group which is 
satisfied with the dissatisfiers (Group D), then 
the two groups would be equal in overall satis- 
faction. However, the traditional theory would 
predict that Group D would be more satisfied 
than Group C. 

The predictions of the traditional theory 
relevant to Hypotheses 3 and 4 assume that 
the contribution of the satisfiers and the dis- 
satisfiers to overall satisfaction is equal. It is 
not a necessary assumption of the traditional 
idea and may be omitted. If it is omitted, 
then the traditional theory and the two-factor 
theory make the same predictions regarding 
Hypotheses 3 and 4. 

The differences between the assumed rela- 
tionships of the satisfier and dissatisfier vari- 
ables to overall job satisfaction as made by 
the traditional theory and the two-factor 
theory are presented in Figures 1 and 2. As 
shown in Figure 1, the relationship between 
the satisfier variables and overall job satis- 
faction is assumed by the traditional theory to 
be linear and by the two-factor theory to be 
curvilinear. The source of this nonlinearity in 
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Fic. 1. Alternative models of the relationship of the 
satisfier variables to job satisfaction. 


the two-factor theory is the assumption that 
being neutral or dissatisfied with regard to a 
satisfier variable leads to the same level of 
overall satisfaction. 

As shown in Figure 2, the relationship be- 
tween the dissatisfier variables and overall 
satisfaction again is assumed to be linear by 
the traditional theory and curvilinear by the 
two-factor theory. However in the case of the 
dissatisfier variables, the source of nonlinear- 
ity in the two-factor theory is the assumption 
that being neutral or satisfied with regard to 
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Fic. 2. Alternative models of the relationship of the 
dissatisfier variables to job satisfaction. 
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a dissatisfier leads to the same level of over- 
all satisfaction. 

It is clear from these two figures where the 
two theories make the opposite predictions 
that are stated in Hypotheses 1 and 2. It is 
clear also why one theory requires two factors 
and the other only one factor. 


METHOD 


The description of the sample and the instruments, 
as well as the procedure for collecting the data and 
forming the nine groups, was presented by Ewen and 
his associates. The nine groups were formed using the 
Job Description Index (Kendall, Smith, Hulin, & 
Locke, 1963; Locke, Smith, Hulin, & Kendall, 1963). 
The present method begins with the nine groups and 
the scores on overall job satisfaction. 

The analysis of the present study consisted of per- 
forming a two-way analysis of variance for selected 
a priori contrasts on the main effects (Winer, 1962). 
In this design were two factors, the satisfier and the 
dissatisfier, each with three levels, dissatisfied, neutral, 
and satisfied. The criterion measure was overall job 
satisfaction as measured by the General Motors 
Faces Scale (Kunin, 1955). The weighted means 
analysis of variance which was employed calculates 
least-square estimates for unequal frequencies.2 Ac- 
cording to Winer this is the method of choice for 
the case where the unequal frequencies are an in- 
tegral part of the design. 

All tests are made on main effects as opposed to 
simple effects. For each factor, the satisfier and the 
dissatisfier, two a priori contrasts were tested. The 
first contrast in each case (Ai and B:) tested the 
hypothesis on which both the two-factor theory and 
the traditional theory predicted a difference. The sec- 
ond contrast in each case (Az and Be) tested the 
hypothesis on which the two-factor theory predicted 
no effect and the traditional theory predicted a dif- 
ference (Hypotheses 1 and 2, respectively). 

In terms of main effects the contrasts for the 
satishier factor were: (a) the group of employees 
which was either dissatisfied or neutral with regard 
to the satisfier versus the group which was satisfied 
with the satisfier, and (b) the group which was 
neutral with regard to the satisfier versus the group 
which was dissatisfied with the satisfier. The con- 
trasts for the dissatisfier factor were: (a) the group 
of employees which was either satisfied or neutral 
with regard to the dissatisfier versus the group 
which was dissatisfied with the dissatisfier, and (b) 
the group which was neutral with regard to the 
dissatisfier versus the group which was satisfied with 
the dissatisfier. These were the a priori contrasts of 
interest. 

Testing for significance usually is not enough 
(Dunnette, 1966). What is needed is some measure of 


2The CD 1604 computer program for the least- 
squares analysis of variance was written by Robert 
R. Golden, Industrial Relations Center, University 
of Minnesota. 
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the strength of the relationships. The measure used 
in this study was Omega-squared (Hays, 1963). 


RESULTS 


The results of the analysis of variance on 
the a priori contrasts are shown in Table 1. 
The data relevant to Hypothesis 1 clearly 
support the traditional theory and argue 
against the two-factor theory. The results in- 
dicate that as predicted by the traditional 
theory and contrary to the predictions made 
by the two-factor theory there was a differ- 
ence on overall satisfaction between the group 
of employees which was dissatisfied with the 
satisfier and the group which was neutral with 
regard to the satisfier. As predicted by both 
theories, there was a difference between the 
group which was either dissatisfied or neutral 
with regard to the satisfier and the group 
which was satisfied with the satisfier. The ex- 
tent to which the data confirm the traditional 
theory as opposed to the two-factor theory is 
apparent from Figure 3. The larger difference 
between adjacent column means was that be- 
tween dissatisfied and neutral, which was the 
very difference the two-factor theory predicted 
would be null. 

The data regarding Hypothesis 2 also clearly 
support the traditional theory at the expense 
of the two-factor theory. The analysis demon- 
strates a difference on overall satisfaction be- 
tween the group of employees which was satis- 
fied with the dissatisfier and the group which 
was neutral with regard to the dissatisfier. As 
predicted by both theories, there was a differ- 


TABLE 1 


ANALYSIS OF VARIANCE FOR A PRIORI CONTRASTS OF 
THE SATISFIER AND DISSATISFIER FACTORS ON 
OVERALL JOB SATISFACTION 








Source of variation* df MS F 
Ai:Sp + Sw versus Sg 1 181.34 | 174.37** 
As:Sy versus Sp 1 63.45 61.01** 
Bi: Dp versus Dy -++ Ds 1 22.97 22.09** 
Bo: Dy versus Dg 1 15 7.07* 
» < D 4 1.50 1.44 

Error 828 1.04 


®S and D represent satisfier and dissatisfier factors, re- 
spectively. Subscripts D, N, and S represent the dissatisfied, 
neutral, and satisfied levels, respectively. 
01, 


b< 
ep < 001. 
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Fic. 3. Profile for the satisfier factor with cell means 
on job satisfaction. 


ence between the group which was either 
satisfied or neutral with regard to the dis- 
satisfier and the group which was dissatisfied 
with the dissatisfier. The degree to which the 
results support the traditional theory is shown 
in Figure 4. 

Thus, in the two cases in which the tradi- 
tional theory and the two-factor theory made 
contradicting predictions, the traditional the- 
ory was confirmed and the two-factor theory 
was disconfirmed. 

The interaction of the satisfier and the 
dissatisfier factors was not significant. There- 
fore, the results of the main effects could be 
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Fic. 4. Profile for the dissatisfier factor with cell 
means on job satisfaction. 
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TABLE 2 


ANALYSIS OF VARIANCE AND OMEGA—SQUARED FOR THE 
STANDARD EFFECTS OF THE SATISFIER AND Dis- 
SATISFIER FACTORS ON OVERALL 
Jos SATISFACTION 











CE Pe is F | Omega! 
variation 
Satisfier (S) > 123.338 119135" vimeds 
Dissatisfier (D) Z eZ, 14.51* 02 
SD 4 1.50 1.44 .0O 
Error 828 1.04 
*p <.001. 


interpreted leaving no reason to deal with the 
simple effects (Winer, 1962). 

The data relevant to the strength of the 
observed relationships are shown in Table 2. 
The results indicate that the satisfier factor 
accounted for 18% of the total variance in 
overall job satisfaction and the dissatisfier 
factor accounted for 2% of the total variance. 
Clearly, the satisfier factor had the more 
potent effect upon job satisfaction. 

Further evidence of the large difference in 
the strength of effect between the satisfier and 
the dissatisfier factor can be seen in a com- 
parison of Figures 3 and 4. The cell means 
were clustered closely around their respective 
column mean on the satisfier factor and were 
dispersed widely from their respective row 
mean on the dissatisfier factor. 


DIscussION 


It is clear that the present results taken as 
a whole provide clear support for the tradi- 
tional theory and argue against the two- 
factor theory. In fact, no support whatever is 
provided for the two-factor theory of work 
motivation. A unidimensional theory of job 
satisfaction in which some variables have a 
more potent effect upon satisfaction than 
others is compatible with the results of this 
analysis. Thus the present analysis supports 
the findings of Friedlander (1964) and Werni- 
mont (1966) that intrinsic factors are more 
important contributors to both satisfaction 
and dissatisfaction than extrinsic factors. 

The qualifications of the present data by 
Ewen and his associates that the data were 
based upon only three of the possible NV 
variables which contribute to overall satisfac- 
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tion deserves to be restated. The finding that 
the variables employed in this analysis ac- 
counted for 20% of the variance in overall 
satisfaction underlines this qualification. 

A question Ewen and his associates ask of 
future studies may be answered based upon 
the present analysis. The question was 
whether intrinsic variables are more potent 
than extrinsic variables, or whether the func- 
tioning of extrinsic variables depends on the 
level of satisfaction with the intrinsic varia- 
bles. The answer in the present case was that 
the intrinsic variable can be more potent than 
the extrinsic variable as exemplified by the 
difference in contribution between 18% and 
2%. The hypothesis of an interaction effect 
is not supported in the present analysis. 

In agreement with Ewen and his associates, 
more research is needed to discover the gen- 
erality of the hypothesis that intrinsic varia- 
bles in the job situation are more potent con- 
tributors to job satisfaction than are the ex- 
trinsic variables. The present author agrees 
with the conclusion that the terms “satisfiers” 
and “dissatisfiers” imply a distinction which 
stands upon empirical data derived from a 
questionable recall procedure and against an 
overwhelming body of data gathered through 
many different procedures. The “intrinsic” 
and “extrinsic” classification appears to be 
the most reasonable candidate on which to 
make the distinction between more and less 
potent contributors to job satisfaction. 
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PREDICTIVE VALUE OF SVIB PRIMARY AND 
REJECT PATTERNS 


JESS FEIST 1 
McNeese State College 


The Strong Vocational Interest Blank (SVIB) is sometimes used with all mem- 
bers of a high school class. Little direct evidence is available, however, which 
indicates the relationship between scores made on this inventory when ad- 
ministered to large groups of high school boys and occupations they sub- 
sequently follow. The present investigation attempted to estimate this rela- 
tionship for a group of men who had completed the SVIB while high school 
juniors or seniors 6-10 yr. earlier, by comparing their primary and reject pat- 
terns with their present jobs when these jobs were classified on the basis of the 
interest ratings found in the United States Employment Service (USES) 
manual, Estimates of Worker Trait Requirements for 4,000 Jobs. Significant 
relationships were found to exist between 6 SVIB patterns and USES interest 
factors. These findings suggest that the SVIB shows promise in assessing broad 
interest factors associated with jobs when primary and reject patterns are used. 


In recent years the Strong Vocational In- 
terest Blank (SVIB) has been used increas- 
ingly with high school students. Usually it is 
given to selected individuals. At times, how- 
ever, it is administered to the entire junior or 
senior class. What evidence exists to support 
this latter practice? 

Predictive validity studies of the SVIB ad- 
ministered to an entire class of high school 
students are not to be found in the literature. 
An investigation reported by Berdie (1960) 
utilized men who had filled out the inventory 
while in high school, but the sample was 
drawn from a University of Minnesota gradu- 
ate population. Earlier studies by Strong 
(1955) and McArthur (1954) used samples 
of university students. 

Although the above studies generally found 
a definite relationship between interest scores 
and subsequent occupation, there is little evi- 
dence to suggest that these results could be 
used by the high school counselor who ad- 
ministers the SVIB to all junior or senior 
boys. In most high schools the majority of 
eleventh and twelfth grade boys will not enter 
the specific business and professional occupa- 
tions found on the SVIB profile. Disregarding 
any possible value the inventory may have in 
assessing the more global aspects of person- 


1 This article represents part of the author’s doc- 
toral dissertation completed at the University of 
Kansas under the supervision of E. Gordon Collister, 


ality, can a high school counselor make valid 
use of the SVIB scores of all students in a 
program of vocational guidance? 

Before it is possible to establish the value 
of the inventory for all high school boys, two 
assumptions must be made. The first is that 
interests are related to jobs held, and the sec- 
ond is that the SVIB measures interests of all 
workers, the skilled and semiskilled workers, 
as well as the business and professional. 

The first of these assumptions has been 
fairly well established. Strong (1955), in his 
longitudinal study cited above, found that 
interests are related to the occupation of men 
at the business and professional level. 

Little evidence has been published in sup- 
port of the second assumption. After working 
with a men-in-general group representative of 
all men in the United States, Strong (1943) 
was convinced that it was not possible to 
clearly differentiate lower-level occupations. 
Clark (1961), however, used a tradesmen-in- 
general group as the point of reference and 
was able to differentiate skilled trades. These 
findings seem to indicate that the particular 
point of reference employed determines to a 
large extent an instrument’s capacity to dif- 
ferentiate various occupations. No direct evi- 
dence has been reported which suggests that 
the SVIB can differentiate among the in- 
terests of all workers. 
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RATIONALE 


_ The purpose of this study was to investi- 
Zate the relationships between SVIB primary 
and reject patterns of a population of high 
school boys and their jobs 6-10 years later. 

To do this, it was necessary to classify jobs 

on the basis on interests. At the time the 
study was begun in 1964 the most extensive 
listing of jobs by interests was in the manual, 
Estimates of Worker Trait Requirements for 
47,000 Jobs (1956). This United States Em- 
oloyment Service (USES) manual lists two 
mterest factors associated with each of the 
4,000 jobs. In all, 10 different factors were 
used. These factors evolved from the ones 
which emerged from a factorial study by 
Cottle (1950). 
_ The five pairs of bipolar factors used by 
-he Employment Service were: 1. Things and 
objects versus 6. People and ideas; 2. Busi- 
ness contact versus 7. Scientific and technical; 
3. Routine, concrete versus 8. Abstract and 
creative; 4. Social welfare versus 9. Non- 
social; 5. Prestige versus 0. Tangible, produc- 
tive satisfaction. 

The next task was to tally SVIB primary 
and reject patterns for each profile. The rules 
oroposed by Korn and Parker (1962) were 
followed. A primary pattern existed when the 
majority of scores for a group were B+ or 
aigher. A reject pattern existed when the 
majority of scores for a group fell at standard 
score 15 or below. 

The third step was to determine which 
USES interest factors were most closely as- 
sociated with each of the seven SVIB family 
2zroups. This was done by finding the interest 
factors of jobs pertinent to each of 49 SVIB 
occupations. Table 1 shows the interest fac- 
tors suggested by the USES manual for 
these SVIB scales. Since this manual does not 
represent a complete sample of jobs in the 
Dictionary of Occupational Titles (1949), 
10 SVIB occupations have no pertinent jobs 
listed. 

Clear patterns seemed to emerge for some 
SVIB groups but not for others. Table 1 in- 
dicates that for SVIB Group I, USES In- 
terest 4 was suggested four times and Interest 
7 was suggested with all seven occupations. 
Because they were suggested more often than 
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TABLE 1 


SuGGEsTEeD USES Interest Facrors or Jops 
PERTINENT TO 49 SVIB OccupaTIons 





SVIB Scale Interests 
Group I 
Artist 
Psychologist 47 
Architect 7,8 
Physician 7,8 
Psychiatrist 4,7 
Osteopath 4,7 
Dentist 7,9 
Veterinarian 4,7 
Group IT 
Mathematician 7,8 
Physicist 7,8 
Chemist 7,8 
Engineer 1,9 
Group IV 
Production Manager 
Farmer 7,0 
Carpenter 9,0 
Forest Service Man 
Aviator eS 
Printer 1,9 
Mathematics Science Teacher 
Industrial Arts Teacher 5,6 
Vocational Agricultural Teacher 6,7 
Policeman 
Army officer 
Group V 
YMCA Director 
Personnel Manager 4,7 
Public Administrator 
Vocational Counselor 45 
Physical Therapist 4,7 
Social Worker 4,6 
Social Science Teacher 4,6 
Business Education Teacher 2,6 
School Superintendent 5,6 
Minister 
Group VIII 
Senior CPA ee 
Accountant 12 
Office Worker i53 
Credit Manager 2,6 
Purchasing Agent 25 
Banker 1,5 
Pharmacist 7,9 
Mortician 2,6 
Group IX 
Sales Manager eo 
Real Estate Salesman 12 
Life Insurance Salesman 1,2 
President (Manufacturing Company) 235 
Group X 
CPA Owner 
Advertising Man 6,8 
Lawyer 2S 
Author-Journalist 5,6 
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any others, Interest 4 (Social welfare) and 
Interest 7 (Scientific-technical) were hy- 
pothesized to be listed more often for jobs of 
subjects (Ss) with a primary pattern in Group 
I than for jobs of Ss with a primary pattern 
in groups other than I. 

For SVIB Group II, Interest 7 (Scientific- 
technical) and Interest 8 (Abstract-creative ) 
occurred most often. They both were listed 
for three of the four occupations. 

Group IV was one for which no definite 
pattern seemed to evolve. Interests 1, 6, 7, 9, 
and 0 each were recorded two times. Because 
of this amorphous pattern, a departure was 
made from the method of selecting the two 
most frequently suggested interests, and a 
hew procedure was employed for Group IV. 
It seemed appropriate to use the two suggested 
interests of the one occupation which cor- 
related most highly with the other scales in 
the group. From Table 193 in Strong (1943) 
it was learned that Carpenter had the highest 
average correlation with the other Group IV 
scales. Because Interest Factor 9 (Nonsocial) 
and Interest Factor 0 (Tangible, productive 
satisfaction) were suggested for Carpenter, 
they were hypothesized to be associated more 
often with jobs of Ss having a primary pat- 
tern in Group IV than with jobs of Ss having 
a primary pattern in groups other than IV. 

Group V seemed characterized by Interest 
4 (Social welfare) and Interest 6 (People- 
ideas). The former occurred five times, the 
latter, four with Group V jobs. 

Interest Factor 1 (Things-objects) and In- 
terest Factor 2 (Business contact) occurred 
most often for both Group VIII and Group 
IX. For Group VIII, Interest 1 was suggested 
four times and Interest 2 was listed five times. 
The three Sales keys in Group IX were all 
associated with Interests 1 and 2 since all 
salesman jobs received those ratings. 

Interest 5 (Prestige) and Interest 6 (People- 
ideas) each occurred twice for the ratings 
given to the three occupations in Group X. 

The same rationale was used to set up 
hypotheses concerning reject patterns. Since 
the suggested interest factors were bipolar, the 
opposite pole was used. To illustrate, because 
Interests 4 and 7 were suggested most often 
for jobs pertinent to SVIB Group I primary 
patterns, the opposite poles, that is, Interests 
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2 and 9, were hypothesized to be associated 
most often with reject patterns in Group I. 
This same rationale was followed for the re- 
maining six families. 


PROCEDURE 


A questionnaire was sent to 196 men who had 
filled out the SVIB while in the eleventh or twelfth 
grades at Highland Park High School, Topeka, 
Kansas, during the years from 1954 to 1958, At that 
time it was school policy to administer the in- 
ventory to all juniors or seniors, A total of 221 
male profiles was available, but only 196 (89%) 
addresses could be determined. The questionnaire was 
designed to learn each S’s 1964 occupation. 

Returns from 182 (93%) men were received. Time 
between testing and follow-up ranged from 54 years 
to a little more than 10 years, with an average in- 
terval of almost 8 years. Responses to the ques- 
tionnaire indicated that 137 Ss were in the work 
force. The other 45 were either in college or in the 
military service. 

Primary and reject patterns for the 137 Ss in the 
work force were determined by using the definitions 
proposed by Korn and Parker (1962). This count 
yielded 124 primary patterns and 82 reject patterns, 
These represent the numbers of patterns, not the 
numbers of people with patterns. To illustrate, if 
one person had three primary patterns, he would be 
counted three times in the total of 124; if one had 
no primary pattern he would not be counted. 

After eliminating the men who were either unem- 
ployed or working at a job for which no USES in- 
terest factor was available, the total usable primary 
patterns dropped to 113. This total was distributed 
among the SVIB scales as follows: Group I, 9; 
Group II, 19; Group IV, 51; Group V, 4; Group 
VIII, 13; Group IX, 12; and Group X, 5. 

Of the 82 reject patterns which emerged, 77 were 
for men whose jobs were listed in the USES manual. 
This total of usable reject patterns was distributed 
among the SVIB families as follows: Group I, 9; 
Group II, 8; Group IV, 0; Group V, 58; Group 
VIII, 0; Group IX, 1; and Group X, 1. 

In setting up and testing the hypotheses only the 
SVIB groups with at least 10 primary patterns or 
10 reject patterns were included. It was hypothesized 
that Ss with SVIB primary patterns in Group II 
would be engaged in jobs of a Scientific-technical 
(Interest 7) and an Abstract-creative (Interest 8) 
nature more often than Ss with primary patterns in 
groups other than Group II. This same association 
was hypothesized for the following SVIB primary 
patterns and USES interest factors; Group IV, In- 
terest 9 (Nonsocial) and Interest 0 (Tangible-pro- 
ductive satisfaction) ; Group VIII, Interest 1 (Things- 
objects) and Interest 2 (Business contact) ; Group 
IX, Interest 1 (Things-objects) and Interest 2 (Busi- 
ness contact), 

Tt was also hypothesized that Ss with reject pat- 
terns in Group V would be engaged in jobs as- 
sociated with Interest 1 (Things-objects) and In- 


PREDICTIVE VALUE OF SVIB PaTTERNS 


‘terest 9 (Nonsocial) more often than Ss with reject 
| patterns in groups other than Group V. This was the 
‘only SVIB group with at least 10 reject patterns. 
To test these hypotheses, the significance of the 
‘difference between the percentage of occasions that 
job titles of the hypothesized nature occurred for 
jobs held by Ss who had a primary (or reject) pat- 
tern in the appropriate SVIB group and the per- 
centage of occasions that they occurred for jobs of 
.Ss who had primary (or reject) patterns in groups 
‘other than the appropriate SVIB group was tested 
by means of the Lawshe-Baker (1950) nomograph. 


FINDINGS 


As seen in Table 2, five of the eight hy- 
-potheses dealing with primary patterns were 
significant in the predicted direction at the 
-.05 level or beyond. Both Interest 7 (Scien- 
tific-technical) and Interest 8 (Abstract-crea- 
tive) were significantly more often associated 
with SVIB Group II primary patterns than 
with primary patterns other than Group II. 
The hypothesized relationships between SVIB 
Group IV primary patterns and both In- 
terest 9 (Nonsocial) and Interest 0 (Tangible- 
productive satisfaction) were supported at the 
.05 level or beyond. A significant relationship 
was also found between SVIB primary pat- 
terns in Group IX and USES Interest Factor 
2 (Business contact). 

No significant relationship was found for 
SVIB Group VIII and either Interest 1 
(Things-objects) or Interest 2 (Business con- 
tact), or between SVIB Group IX and USES 
Interest 1 (Things-objects). 


TABLE 2 


TESts OF SIGNIFICANCE FOR EACH HYPOTHESIZED 
Factor AssocraTED witH 4 SVIB Primary Groups 
AND 1 REJECT GROUP 








Group Predicted interest t 
a 
7 EZ i ZAoe 
Ii 8 1.74* 
IV 9 1.80* 
IV 0 2.84* 
Vill 1 mhz 
VIII 2 1.01 
Ix 1 74 
Ix 2 1.81* 
Reject 
V 1 ik 
V 9 eae 





* > <.05, one-tailed. 
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Table 2 shows that one hypothesis involving 
a reject pattern was supported while one was 
not. A positive relationship was found be- 
tween jobs of Ss who had reject patterns in 
Group V and Interest 9 (Nonsocial). No sig- 
nificant relationship, however, was noted for 
Group V reject patterns and Interest 1 
(Things-objects). 


DiscussION 


These findings should be viewed with cau- 
tion. Only 137 ex-Highland Park male stu- 
dents now in the work force were included in 
the testing of these hypotheses. For this rea- 
son, most of the primary and reject pattern 
groups were quite small. Nevertheless, the 
findings indicated an association between six 
SVIB patterns and the specific hypothesized 
interest factor. Most confidence can probably 
be placed in the relationship between SVIB 
Group IV primary patterns and jobs which 
involve satisfaction gained from tangible and 
productive activities. Of the present sample 
45% who had primary patterns in Group IV 
were working in jobs of this kind. Only 21% 
of those who had primary patterns in some 
other group were engaged in jobs of this type. 
It is encouraging that this relationship was 
based on the most complete sample available. 

The results of this study suggest that the 
SVIB may have some validity with high 
school boys when primary and reject patterns 
are used to predict the type of job they will 
be engaged in 6-10 years later. This investiga- 
tion can also be considered to be a test of the 
validity of the USES classification system 
since those ratings were based on factors 
which evolved largely from a factor analysis 
of the SVIB. 
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A NOTE ON THE FACTORIAL INVARIANCE OF THE 
FEMALE FORM OF THE SVIB 


GEORGE H. DUNTEMAN 


Regional Rehabilitation Research Institute, University of Florida 


The factorial structure of an early and recent female form of the SVIB was 
compared, It was found that 2 of the largest factors isolated in the recent study 
were highly similar to 2 out of the 4 factors isolated in the earlier study. The 
remaining 2 factors isolated in the earlier study showed moderate similarity to 2 
additional factors out of the 9 factors found in the recent study. It was con- 
cluded that, in general, the factorial structure of the 2 forms was quite similar. 


A recent article by Anderson (1965) con- 
cerning the factorial structure of the Female 
Form of the Strong Vocational Interest 
Blank (SVIB) brings to mind a much earlier 
article by Crissy and Daniel (1939) who in- 
vestigated the factorial structure of 18 scales 

of an earlier female form of the SVIB. 

Crissy and Daniel used centroid factor 
-analysis and graphical rotation and found 
that four rotated factors accounted for the 
intercorrelations among the 18 scales. The 
four rotated factors, presented in order of 
percentage of common variance accounted for, 
are shown in Table 1. The authors labeled 
Factor I as Interest in Male Association 
(M.A.), Factor II as Interest in People 
(P.), Factor III as Interest in Language (L.), 
and Factor IV as Interest in Science (S.). 
Anderson’s analysis which concerned 29 scales 
of the most recent form of the SVIB found 
that nine factors rotated by Varimax and 
initially extracted by principal components 
factor analysis accounted for most of the com- 
mon variance. 

The present author found that four of And- 
erson’s factors exhibited varying degrees of 
similarity in regard to the level and pattern 
of factor loadings to the four rotated factors 
of Crissy and Daniel for the 17 scales that the 
two analyses had in common. The four Ander- 
son factors which were similar to the Crissy 
and Daniel factors are also presented in Table 
1. In Table 1, each of the Crissy and Daniel 
factors are paired with the similar factor from 
the Anderson study so that the similarity of 
the factor loadings for the 17 scales common 
to the two analyses can be examined. Ander- 
son’s sample consisted of 203 female college 
students enrolled in an introductory health 


related professions course while Crissy and 
Daniel’s sample consisted of 500 women in 
general. 

As can be seen from Table 1, Anderson’s 
largest rotated factor is strikingly similar in 
respect to the level and pattern of factor load- 
ings to Crissy and Daniel’s largest rotated 
factor, except for the Social Worker and 
Nursing scales which split off and helped de- 
termine separate factors in Anderson’s analy- 
sis. Both Anderson’s Factor I and Crissy and 
Daniel’s Factor I can be given the same in- 
terpretation. The second largest factor from 
Crissy and Daniel’s analysis was somewhat 
similar to Anderson’s Factor III and their 
third factor was somewhat similar to Ander- 
son’s Factor VII. However, Crissy and 
Daniel’s fourth factor was highly similar to 
Anderson’s Factor II and both of these factors 
were given the same interpretation by their 
respective author or authors, that is, Scien- 
tific Interests. 

In general, Anderson’s two largest factors 
were highly similar to two out of the four 
Crissy and Daniel factors and two more of 
Anderson’s factors had moderate similarity to 
the two remaining Crissy and Daniel factors. 
The two factor structures were quite similar 
except that the inclusion of 12 more scales in 
Anderson’s analysis led to an additional five 
factors. The Nursing and Social Worker scales 
which loaded heavily on Crissy and Daniel’s 
largest factor were found to be instrumental 
in defining two separate factors in Anderson’s 
analysis. This finding could be due in part to 
the rapid changes in educational and job de- 
mands that have occurred in the professions 
of nursing and social work over the past 
quarter of a century. Consequently, one would 
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TABLE 1 
A ComPARISON OF THE 4 Facrors Smmitar To Bota ANDERSON’S AND 
Crissy AND DANTEL’s ANALYSES 
Factor M.A. Factor P. Factor L. Factor S, 
SVIB scale 
Crissy | Anderson} Crissy | Anderson| Crissy | Anderson Crissy | Anderson 
Factor I | Factor I | Factor II | Factor ITI} Factor III/Factor VIT| Factor IV] Factors IT 
Author — .84 —.74 —.28 —.05 34 ml —.27 — 25 
Librarian —.74 —.49 —.20 —,.15 42 nal —.03 38 
Artist — .62 —.75 —.61 —.27 ma —.05 —.07 08 
Physician —.55 —.56 —.12 18 a) —.06 70 .66 
Dentist 28 —.19 —.46 —.03 —.21 —.14 .68 87 
Life Insurance 

Saleswoman —.26 21 29 45 —.42 —.14 —.27 —.56 
Social Worker —.50 —.08 69 06 25 18 —.11 — 38 
English Teacher —.03 —.27 38 01 76 71 0 — 38 
Social Science 

Teacher 38 04 18 28 nO, 65 05 — 13 
Lawyer —.25 —.01 80 80 —.14 09 08 —.14 
YMCA Secretary 23 05 18 19 m2 5 16 —.12 
Mathematics and 

Science Teacher 58 25 08 —.07 ile} 05 65 .80 
Nurse edz 03 —.03 02 —.20 06 29 13 
Stenographer- 

Secretary aftl 80 —.03 —.11 —.42 —.11 —.45 —.29 
Office Worker Eid 85 02 — 13} — 57 —,17 — 22 2 
Housewife 83 .66 —.16 —.47; — .36 .08 — 24 —.08 
Masculinity- 

Femininity —.16 —.11 —.16 —.17 — 30 31 30 —.62 























® Signs of factor loadings have been reflected. 


» A high score on the earlier M-F scale indicated masculinity while a high score on the current M-F scale indicates femininity. 


expect to find different types of individuals 
currently in these professions than in the past, 
and therefore different relationships would be 
expected of these two interest scales with the 
remaining interest scales between past and 
present studies. Some of Anderson’s additional 
factors might be regarded as overdefined 
specifics. For example, Factor IX was defined 
by the Music Performer and Music Teacher 
scales and Factor V was primarily defined by 
the High School Physical Education Teacher 
scale and the College Physical Education 
Teacher scale. Factor VIII was also quite 
narrow in scope, being a culinary factor solely 
defined by the Dietician and Home Economics 
Teacher scales. 


In summary, the similarity between the two 
factor structures is impressive when one con- 
siders the following: different factor-analytic 
and rotational techniques were used; different 
female populations were sampled; different 
numbers of scales were considered; and the 
studies were separated by a time span of over 
a quarter of a century. 
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MOTIVATOR AND HYGIENE DIMENSIONS FOR 
RESEARCH AND DEVELOPMENT ENGINEERS* 


GEORGE B. GRAEN 


University of Minnesota 


Herzberg’s 2-factor theory appears to offer promising leads to new research 
on work motivation. One of the main problems in following these leads is that 
the measurement of the work factors must be accomplished through inter- 
viewers. The purpose of this study was to develop psychometric measures of 
these work factors through the method of factor analysis. A questionnaire was 
developed based upon Herzberg’s classification scheme. Engineers served as Ss. 
The results show that the dimensions proposed by Herzberg when represented 
as items and rated by Ss do not result in homogeneous groupings in the factor- 


analytic sense. 


Herzberg, Mausner, and Snyderman (1959) 
explored job “factors” contributing to satis- 
faction and dissatisfaction for engineers and 
accountants. They used the critical incidents 
method to interview the subjects about previ- 
ously satisfying and dissatisfying job situa- 
tions and submitted the interview protocols to 
a content analysis. As a result, Company 
Policies and Practices, Supervision, Human 
Relations, and Working Conditions were men- 
tioned more frequently in the stories about 
bad times than in stories about good times. 
These variables which were related to the 
work environment were called job-context 
variables or “hygiene factors.” In contrast, 
Achievement, Recognition, Advancement, Re- 
sponsibility, and Work Itself were reported 
more frequently in stories about good times. 
These variables which were related more 
closely to the work itself were referred to as 
job-content variables or “motivators.” In gen- 
eral these findings have been replicated by 
studies employing the story-telling approach 
(Herzberg, 1965; Schwartz, Jenusaitis, & 
Stark, 1963). 

Herzberg and his associates attribute these 
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the management and engineers of the two companies 
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master’s thesis submitted to the University of Min- 
nesota, 1963. Part of this paper was read at the Mid- 
western Psychological Association, Chicago, April 
1965. 


findings to the fact that job factors really are 
two-dimensional. On one dimension, the job- 
content variables when present and favorable 
tend to result in satisfaction, but their ab- 
sence does not result in dissatisfaction. On 
the other dimension, the job-context variables 
when present and unfavorable tend to produce 
dissatisfaction, but their absence does not pro- 
duce satisfaction. According to this two-factor 
theory, increases in favorable job-context 
variables will increase job attitudes to a neu- 
tral level but only increases in favorable job- 
content variables will increase satisfaction 
above this level. 

If job variables really do operate in this 
two-dimensional way, the results yield ob- 
vious and important implications for organiza- 
tional practices. In situations where it is de- 
sirable to increase positive feelings toward 
the job, Herzberg’s results seem to prescribe 
that the most efficient means to accomplish 
this end is to use available resources to im- 
prove the job-context variables only enough 
to remove the irritants from the job environ- 
ment, and more importantly, to improve the 
job-content variables as much as possible so 
as to achieve a high level of job satisfaction. 

To facilitate such actions, it would be nice 
to have something other than interviewers’ 
judgments to measure the dimensions, the 
“motivators” and “hygiene factors.” The main 
difficulty in using Herzberg’s categorization 
procedure to measure job dimensions is that 
the coding is not completely determined by 
the rating system and the data, but requires, 
in addition, interpretation by the rater. For 
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example, the dimension of Supervision-tech- 
nical includes among others the categories: 
(a) “Supervisor competent,” (6) “Supervisor 
incompetent,” and (c) “Supervisor showed 
favoritism.” The three categories all call for 
an evaluation of the supervisor’s behavior. If 
the respondent offers the evaluation, no in- 
terpretation by the rater is required. How- 
ever, if the respondent merely describes the 
supervisor’s behavior, an interpretation by 
the rater is necessary. 

The necessity for evaluations of the data 
by a rater may lead to contamination of the 
dimensions so derived. Employing a story 
presented by Herzberg to illustrate the di- 
mension of Recognition, Vroom (1965) 
pointed out the way in which the two- 
factor theory may contaminate the coding 
procedure. The dimensions in a situation such 
as this may reflect more the rater’s hypotheses 
concerning the compositions and interrelations 
of dimensions than the respondent’s own per- 
ceptions. Thus, one could conceivably learn 
more about the perceptions of raters than 
those of the respondents. 

In view of the possible sources of error in 
this categorization procedure, a more objec- 
tive approach would be to have the respon- 
dents do the rating and perform the necessary 
evaluations. The respondent’s ratings then 
could be factor analyzed to identify the fac- 
tors used in the evaluations. With this pro- 
cedure one could be more confident the factors 
actually represent the perceptual job domain 
of the respondents rather than the rater’s 
interpretation of the job domain. 

In short, what is needed to measure the 
dimensions postulated by Herzberg and his 
associates are objective psychometric meas- 
ures. This study was designed to develop such 
measures by using the method of factor analy- 
sis. 


PROCEDURE 


Questionnaire items were written to reflect as ac- 
curately as possible the job categories employed by 
Herzberg when he defined his dimensions (Herzberg 
et al., 1959, pp. 143-146). A major determinant of 
the factors resulting from a factor analysis is the 
sampling of items. Therefore, no attempt was made 
to improve upon the content of Herzberg’s sample, 
and every attempt was made to represent it. In all, 
96 items were written to cover the content of Herz- 
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berg’s 16 dimensions. Each item was written spe- 
cifically to measure an attitude corresponding to a 
particular job category. An example of how the 
questionnaire items were derived is as follows. The 
dimension “Status” was defined by all interview data 
included in the categories (a) “Signs or appurtenances 
of status,” (b) “Having a given status,” and (c) 
“Not having a given status.” The respective ques- 
tionnaire items for “Status” were (a) “I receive a 
symbol of status on my job,” (6) “I have a given 
status because of my job,” and (c) “I do not have 
a given status because of my job.” Both positive 
and negative items were developed in order to elicit 
response for both positive and negative feelings to- 
ward the job. 

The items required, as did the categories, further 
definitions before they could be rated. With content 
analysis this task was performed by a judge. In this 
study the subjects performed the task. 

The 96 items were placed in a format utilizing an 
importance scale ranging from “Very Important to 
My Job Satisfaction” through “Not Important” to 
“Very Important to My Job Dissatisfaction.” Below 
is an example of the scale. 


Very Important Not Very Important 
To My Job Impor- To My Job 
Satisfaction tant Dissatisfaction 

5: tes 62 wil 2 ees reels 


Respondents were asked to rate the importance of 
each job situation to their overall job satisfaction or 
dissatisfaction. The 96 items were scored on the de- 
gree of importance to overall feelings toward the 
job. Thus the numerical values shown on the scale 
were used as scores. 

Respondents were 153 professional engineers work- 
ing in design and development for two electronics 
firms located in the Twin Cities (Minneapolis-St. 
Paul) area. 

The 96 items were intercorrelated, and the 96 X 96 
correlation matrix was factor analyzed using the 
Principal Components method with the Kaiser Vari- 
max orthogonal rotation. Squared multiple correla- 
tions were employed as estimates of the com- 
munalities. The Kaiser criterion was used to de- 
termine the number of factors to extract (Harman, 
1960) .2 


RESULTS 


There were 21 factors, accounting for 61% 
of the total variance, which were extracted, 
However, only 11 of these included three or 
more items, and these were the ones chosen 
for further interpretation.? Table 1 shows the 


? The CD 1604 program was written by Lawrence 
Liddiard, Numerical Analysis Center, University of 
Minnesota. 

3A table giving the complete correlation matrix 
has been deposited with the American Documentation 
Institute. Order Document No. 9078 from ADI 
Auxiliary Publications Project, Photoduplication Serv- 


























MoTIvATOR AND HycGIENE DIMENSIONS 565 
TABLE 1 
DeErineD VARm™MAxX Factor STRUCTURE FOR ENGINEERS 
Factor FL I SMC p 

Salary-Advancement WD 

I receive a wage increase 78 J0 84 

The amount of salary I receive for my job is not adequate MP 18 87 

I expected a wage increase, but did not receive it nie 74 88 

My wages compare unfavorably with others doing similar or the same A cL 89 

work 

I expected advancement but I failed to receive it 62 os) 83 

TI receive advancement on my job 59 .68 83 

The amount of salary I received for my job is adequate 98 .69 81 

My wages compare favorably with others doing similar or the same work 57 0 83 

T am demoted from one position to a lesser position 2 61 ai 
Interpersonal Relations—subordinates and peers 10 

I am on good working terms with my subordinates 718 74 84. 

I am on poor personal terms with my subordinates 76 he 83 

I am on good personal terms with my subordinates 74, .69 83 

I do not like the people I work with on my job .65 .68 719 

I am on poor working terms with my subordinates .05 .68 719 

T like the people I work with on my job 59 Ao 72 

There is a lack of cooperation on the part of my co-workers 44 .60 710 
Working Conditions—physical 07 

I have good facilities to work with on my job —.72 16 87 

The physical surroundings of my work place are good —./2 Sie 85 

I have poor facilities to work with on my job —.67 .62 74 

The physical surroundings of my work place are poor —.61 .65 81 
Recognition-Supervision .06 

My supervisor shows favoritism to his “‘special’”’ subordinates .05 61 75 

My work is successful but I am criticized and punished for it 08 .69 82 

My supervisor is unwilling to even listen to my suggestion eb) 65 .80 

One of my co-workers gets credit for my successful work 1 .60 18 

My work is successful but I am criticized for it 48 61 719 
Recognition-Achievement .05 

My idea is formally accepted by the company —.72 .66 il 

I proved to my critics that I was right all along —.70 57 15 

My work is praised and special reward is given to me — 43 56 16 
Work Itself .05 

My job gives me an opportunity to perform all phases of an operation 68 04 18 

My work represents a creative or challenging opportunity for me 05 55 PD 

My work is varied in nature 94 58 hs 

My work is routine in nature Al 63 .80 
Job Security .05 

T haven’t any objective signs that my job is secure —.67 .62 He 

I receive objective signs that my job is secure — .64 .64 AS) 

Tam in full agreement with the goals of the company I work in — 46 .66 19 
Position Itself 05 

My work is too easy for me — .63 .65 83 

I have too much work to do on my job —.58 45 .67 

T receive objective evidence that I am advancing in skill on my job —.50 63 18 
Achievement 04 

T can not see the results of my own work .60 62 vii 

T can see the results of my own work 53 4 LS 

T have a good idea which solves a problem AT 56 20 
Status .04 

I have a given status because of my job —.74 64 14 

I do not have a given status because of my job —.67 .64 74 

I receive a symbol of status on my job — 56 70 80 
Working Conditions—co-workers 03 

My work place is isolated from other people 73 62 aT : 

I am isolated from the work group on my job 62 57 .67 

T work with and around co-workers on my job AO 66 17 


Note.—N = 153. FL = Factor loading of statement. h? = Calculated communality of statement. SMC = Squared 


multiple correlation used as estimate of communality. = Proportion of common variance accounted for by factor. 
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11 factors, the defining items, their factor 
loadings, communalities, and the proportion 
of common variance accounted for by each 
factor. The factors were given descriptive 
names based on the names of Herzberg’s 
dimensions which provided the nucleus of de- 
fining items. For example, Salary-Advance- 
ment was defined by six items from the 
dimension of Salary and three from that of 
Advancement. 

Of the 11 factors, only Job Security and 
Status included all the items which were 
originally written to measure them. The next 
2 factors in terms of similarity to Herzberg’s 
dimensions were Work Itself and Achievement 
which included, respectively, four out of six 
and three out of seven items written to repre- 
sent them. Only the above 4 factors contained 
items written to measure one dimension. The 
other factors all included items based on sev- 
eral different dimensions. For example, Salary- 
Advancement contained items from the dimen- 
sions for Salary and Advancement; Jnterper- 
sonal Relations—subordinates and peers con- 
tained items from Interpersonal Relations 
—subordinates and Interpersonal Relations 
—peers. Also, it is clear that- items carefully 
written to represent specific dimensions end 
up in quite different factors. Thus, items for 
Recognition appeared on two separate factors: 
Recognition-Achievement and Recognition- 
Supervision. Items for Working Conditions 
appeared in both Working Conditions—physi- 
cal and Working Conditions—co-workers. 


DISCUSSION 


It is clear from the foregoing results that 
the content categories established by Herz- 
berg when represented as items and rated by 
engineers do not result in factors. They do 


ice, Library of Congress, Washington, D. C. 20540. 
Remit in advance $1.75 for microfilm or $2.50 for 
photocopies and make checks payable to: Chief, 
Photoduplication Service, Library of Congress. 

*A table giving the complete factor structure has 
been deposited with the American Documentation In- 
stitute. Order Document No. 9078 from ADI Auxiliary 
Publications Project, Photoduplication Service, Li- 
brary of Congress, Washington, D. C. 20540. Remit 
in advance $1.75 for microfilm or $2.50 for photo- 
copies and make checks payable to: Chief, Photo- 
duplication Service, Library of Congress. 
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not constitute homogeneous groupings of job 
content in the factor-analytic or correlational 
sense; certainly, the engineers participating 
in this study failed to perceive them as such. 

The finding that items from a single dimen- 
sion ended up~in different factors and that 
items from different dimensions ended up in 
the same factor points up the difficulty in- 
herent in any subjective effort to form cate- 
gories or “factors” from interview data. Al- 
though the use of an a priori classification 
scheme often appears valid, such use is sub- 
ject to rater (or judge) bias and error. It may 
be useful during the exploratory phases of a 
study to proceed on the basis of an a priori 
classification scheme, but such a scheme must 
be treated as a set of hypothesized dimen- 
sions. These hypothesized dimensions must be 
checked out by more objective methods when 
they are to be employed in studies designed 
to investigate lawful relationships. 

In the present study many of the items de- 
rived from Herzberg’s categories appear not 
to belong together. They did not demonstrate 
sufficient homogeneity to yield factors. This 
finding confirms once again the importance of 
empirical validation before establishing cate- 
gories as if they were distinct and measurable 
entities. It is suggested that, before the im- 
plications of Herzberg’s two-factor theory are 
acted upon, the theory be tested further in 
carefully designed research utilizing objective 
measures of the postulated dimensions. 
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